This is an R Markdown document that summarises the main figures and data associated with the information present in the ErythroCite database. This database uses a systematic map to search for information on the cell size of fish blood cells.
Size is a fundamental trait in biology, and cell size plays a key role in cellular functions, influencing physiological adaptations and evolutionary processes in living organisms. For decades, scientists have been fascinated by the considerable variation in cell sizes among animals, yet systematic efforts to compile such data have been scarce. To address this gap, we employed a systematic map approach to create ErythroCite, an open-source database of fish erythrocyte sizes. This comprehensive resource encompasses 1,764 records from 660 species among four major lineages: Actinopterygii, Chondrichthyes, Dipnoi, and Cyclostomata. Our findings reveal a remarkable 414-fold range in cell volume, with most studies on bony fishes and limited data on juveniles and earlier life stages. Life stage and sex were infrequently reported, but available data showed equal representation of adult of females and males. ErythroCite offers valuable insights for studies in macroecology, macrophysiology, comparative physiology, evolutionary biology and cell biology. We anticipate this resource will facilitate comparative approaches and meta-analyses, globally driving further exploration of erythrocyte diversity and function in fish.
When using the data and/or code associated with this project, they should be cited as follows:
Leiva, F. P., Molina-Venegas, R., Alter, K., Freire, C.A., Hendriks A. J., Hermaniuk, A., Serre-Fredj, L., Shokri, M., Czarnoleski, M., & Mark, F. C. (2025). ErythroCite: A systematic map and open-source database on red blood cell size of fishes.
Leiva, F. P., Molina-Venegas, R., Alter, K., Freire, C. A., Hendriks A. J., Hermaniuk, A., Serre-Fredj, L., Shokri, M., Czarnoleski, M., & Mark, F. C. (2025). ErythroCite: A systematic map and open-source database on red blood cell size of fishes. Zenodo. https://doi.org/10.5281/zenodo.14781325.
This script is authored by Félix P. Leiva. For any questions related to this resource, please contact me at the email address: felixpleiva@gmail.com.
This code routine may contain typographical errors, specific lines of code, or comments in Spanish (my native language). I apologise for any inconvenience this might cause in understanding the code.
I will update the GitHub repository with any identified errors when appropriate. Therefore, I strongly recommend users to check the repository where the data are stored: https://github.com/felixpleiva/ErythroCite. This ensures access to the most current version of the code and data.
Should you encounter any errors in the code or data, please let me know via email.
Gracias!
This repository is provided by the author under the licence Attribution-NonCommercial-NoDerivatives 4.0 International.
rm(list = ls())
library(kableExtra) # Enhances tables created with 'knitr::kable'
library(DataExplorer) # Automates exploratory data analysis
library(dplyr) # Efficient data manipulation
library(ggplot2) # Data visualisation based on the grammar of graphics
library(RefManageR) # Manages references and citations
library(ggpubr) # Creates publication-ready graphics
library(cowplot) # Arranges and annotates plots
library(tidygeocoder) # Converts addresses into geographic coordinates
library(rnaturalearth) # Accesses Natural Earth geographic data
library(ape) # Analyses phylogenies and evolution
library(ggtree) # Visualises and annotates phylogenetic trees
library(tibble) # Alternative to data frames
library(ggthemes) # Additional themes for 'ggplot2' graphics
library(fishualize) # Fish-inspired colour palettes
library(sessioninfo) # Documents session environment for reproducibility
library(details) # Adds inline or interactive details
library(rfishbase) # Retrieve taxonomy from FishBase (https://www.fishbase.se)
The associated data, as well as supplementary files, were directly imported from:
dat <- read.csv("../outputs/cell_size_with_taxonomy.csv")
tree<-read.tree("../outputs/Phylogenetic tree for 650 species included in ErythroCite.tre")
refs <- ReadBib("../outputs/ErythroCite literature.bib")
str(dat)
## 'data.frame': 1765 obs. of 45 variables:
## $ species_reported : chr "Abalistes stellatus" "Abramis brama" "Abudefduf marginatus" "Abudefduf saxatilis" ...
## $ double_checked : chr "YES" "YES" "YES" "YES" ...
## $ database : chr "Gregory_2024" "Gregory_2024" "systematic_search_english" "felix" ...
## $ key : chr "9_gregory" "pdf_not_found_Gulliver_1875" "rayyan-33186100" "16_fpl" ...
## $ body_mass_gram : num NA NA NA NA NA NA NA NA NA NA ...
## $ sex : chr NA NA NA NA ...
## $ life_stage : chr NA NA NA NA ...
## $ lat_dec : num NA NA 32.4 18.2 18.2 ...
## $ long_dec : num NA NA -64.7 -66.5 -66.5 ...
## $ location_description: chr NA NA "near the Bermuda Biological Station, St. George, Bermuda\n" "southwestern and western coasts of Puerto Rico" ...
## $ sample_size : int NA NA 50 NA NA NA NA NA NA NA ...
## $ number_of_specimens : int NA NA NA NA NA NA NA NA NA NA ...
## $ estimate_error_type : chr NA NA "2SE" NA ...
## $ cell_length : num NA 10.6 NA 10.2 10 10.5 10.5 10 9 9 ...
## $ cell_length_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ cell_width : num NA 7.1 NA 7.5 7.5 7.5 7.5 7.5 6.5 5.5 ...
## $ cell_width_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ cell_area : num 43.9 59.4 NA NA NA ...
## $ cell_area_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ cell_volume : num NA NA NA NA NA NA NA NA NA NA ...
## $ cell_volume_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ mcv : num NA NA NA NA NA NA NA NA NA NA ...
## $ mcv_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_length : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_length_error: num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_width : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_width_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_area : num 7.36 NA 4.1 NA NA NA NA NA NA NA ...
## $ nucleus_area_error : num NA NA 0.09 NA NA NA NA NA NA NA ...
## $ nucleus_volume : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_volume_error: num NA NA NA NA NA NA NA NA NA NA ...
## $ notes : chr "Hardie, D.C. and P.D.N. Hebert (2003). The nucleotypic effects of cellular DNA content in cartilaginous and ray"| __truncated__ "Gulliver, G. (1875). Observations on the sizes and shapes of the red corpuscles of the blood of vertebrates, wi"| __truncated__ "Table 1" "Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico" ...
## $ phylum : chr "Chordata" "Chordata" "Chordata" "Chordata" ...
## $ class : chr "Actinopterygii" "Actinopterygii" "Actinopterygii" "Actinopterygii" ...
## $ order : chr "Tetraodontiformes" "Cypriniformes" "Perciformes" "Perciformes" ...
## $ family : chr "Balistidae" "Leuciscidae" "Pomacentridae" "Pomacentridae" ...
## $ genus : chr "Abalistes" "Abramis" "Abudefduf" "Abudefduf" ...
## $ species : chr "Abalistes stellatus" "Abramis brama" "Abudefduf saxatilis" "Abudefduf saxatilis" ...
## $ source : chr "ncbi" "ncbi" "gbif" "ncbi" ...
## $ taxo_level : chr "Species" "Species" "Species" "Species" ...
## $ isMarine : int 1 0 1 1 1 1 1 1 1 1 ...
## $ isBrackish : int 0 1 0 0 0 0 0 0 0 0 ...
## $ isFresh : int 0 1 0 0 0 0 0 0 0 0 ...
## $ realm : chr "marine" "freshwater-brackish" "marine" "marine" ...
## $ species_underscored : chr "Abalistes_stellatus" "Abramis_brama" "Abudefduf_saxatilis" "Abudefduf_saxatilis" ...
dat$sex <- as.factor(dat$sex)
dat$life_stage <- as.factor(dat$life_stage)
dat$sample_size <- as.factor(dat$sample_size)
dat$number_of_specimens <- as.factor(dat$number_of_specimens)
str(dat)
## 'data.frame': 1765 obs. of 45 variables:
## $ species_reported : chr "Abalistes stellatus" "Abramis brama" "Abudefduf marginatus" "Abudefduf saxatilis" ...
## $ double_checked : chr "YES" "YES" "YES" "YES" ...
## $ database : chr "Gregory_2024" "Gregory_2024" "systematic_search_english" "felix" ...
## $ key : chr "9_gregory" "pdf_not_found_Gulliver_1875" "rayyan-33186100" "16_fpl" ...
## $ body_mass_gram : num NA NA NA NA NA NA NA NA NA NA ...
## $ sex : Factor w/ 3 levels "both","female",..: NA NA NA NA NA NA NA NA NA NA ...
## $ life_stage : Factor w/ 3 levels "adult","fingerlings",..: NA NA NA NA NA NA NA NA NA NA ...
## $ lat_dec : num NA NA 32.4 18.2 18.2 ...
## $ long_dec : num NA NA -64.7 -66.5 -66.5 ...
## $ location_description: chr NA NA "near the Bermuda Biological Station, St. George, Bermuda\n" "southwestern and western coasts of Puerto Rico" ...
## $ sample_size : Factor w/ 37 levels "1","2","3","4",..: NA NA 22 NA NA NA NA NA NA NA ...
## $ number_of_specimens : Factor w/ 45 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
## $ estimate_error_type : chr NA NA "2SE" NA ...
## $ cell_length : num NA 10.6 NA 10.2 10 10.5 10.5 10 9 9 ...
## $ cell_length_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ cell_width : num NA 7.1 NA 7.5 7.5 7.5 7.5 7.5 6.5 5.5 ...
## $ cell_width_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ cell_area : num 43.9 59.4 NA NA NA ...
## $ cell_area_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ cell_volume : num NA NA NA NA NA NA NA NA NA NA ...
## $ cell_volume_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ mcv : num NA NA NA NA NA NA NA NA NA NA ...
## $ mcv_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_length : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_length_error: num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_width : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_width_error : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_area : num 7.36 NA 4.1 NA NA NA NA NA NA NA ...
## $ nucleus_area_error : num NA NA 0.09 NA NA NA NA NA NA NA ...
## $ nucleus_volume : num NA NA NA NA NA NA NA NA NA NA ...
## $ nucleus_volume_error: num NA NA NA NA NA NA NA NA NA NA ...
## $ notes : chr "Hardie, D.C. and P.D.N. Hebert (2003). The nucleotypic effects of cellular DNA content in cartilaginous and ray"| __truncated__ "Gulliver, G. (1875). Observations on the sizes and shapes of the red corpuscles of the blood of vertebrates, wi"| __truncated__ "Table 1" "Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico" ...
## $ phylum : chr "Chordata" "Chordata" "Chordata" "Chordata" ...
## $ class : chr "Actinopterygii" "Actinopterygii" "Actinopterygii" "Actinopterygii" ...
## $ order : chr "Tetraodontiformes" "Cypriniformes" "Perciformes" "Perciformes" ...
## $ family : chr "Balistidae" "Leuciscidae" "Pomacentridae" "Pomacentridae" ...
## $ genus : chr "Abalistes" "Abramis" "Abudefduf" "Abudefduf" ...
## $ species : chr "Abalistes stellatus" "Abramis brama" "Abudefduf saxatilis" "Abudefduf saxatilis" ...
## $ source : chr "ncbi" "ncbi" "gbif" "ncbi" ...
## $ taxo_level : chr "Species" "Species" "Species" "Species" ...
## $ isMarine : int 1 0 1 1 1 1 1 1 1 1 ...
## $ isBrackish : int 0 1 0 0 0 0 0 0 0 0 ...
## $ isFresh : int 0 1 0 0 0 0 0 0 0 0 ...
## $ realm : chr "marine" "freshwater-brackish" "marine" "marine" ...
## $ species_underscored : chr "Abalistes_stellatus" "Abramis_brama" "Abudefduf_saxatilis" "Abudefduf_saxatilis" ...
head(dat)
## species_reported double_checked database
## 1 Abalistes stellatus YES Gregory_2024
## 2 Abramis brama YES Gregory_2024
## 3 Abudefduf marginatus YES systematic_search_english
## 4 Abudefduf saxatilis YES felix
## 5 Abudefduf saxatilis YES felix
## 6 Abudefduf saxatilis YES felix
## key body_mass_gram sex life_stage lat_dec long_dec
## 1 9_gregory NA <NA> <NA> NA NA
## 2 pdf_not_found_Gulliver_1875 NA <NA> <NA> NA NA
## 3 rayyan-33186100 NA <NA> <NA> 32.36700 -64.69760
## 4 16_fpl NA <NA> <NA> 18.22477 -66.48583
## 5 16_fpl NA <NA> <NA> 18.22477 -66.48583
## 6 16_fpl NA <NA> <NA> 18.22477 -66.48583
## location_description sample_size
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 near the Bermuda Biological Station, St. George, Bermuda\n 50
## 4 southwestern and western coasts of Puerto Rico <NA>
## 5 southwestern and western coasts of Puerto Rico <NA>
## 6 southwestern and western coasts of Puerto Rico <NA>
## number_of_specimens estimate_error_type cell_length cell_length_error
## 1 <NA> <NA> NA NA
## 2 <NA> <NA> 10.6 NA
## 3 <NA> 2SE NA NA
## 4 <NA> <NA> 10.2 NA
## 5 <NA> <NA> 10.0 NA
## 6 <NA> <NA> 10.5 NA
## cell_width cell_width_error cell_area cell_area_error cell_volume
## 1 NA NA 43.91 NA NA
## 2 7.1 NA 59.39 NA NA
## 3 NA NA NA NA NA
## 4 7.5 NA NA NA NA
## 5 7.5 NA NA NA NA
## 6 7.5 NA NA NA NA
## cell_volume_error mcv mcv_error nucleus_length nucleus_length_error
## 1 NA NA NA NA NA
## 2 NA NA NA NA NA
## 3 NA NA NA NA NA
## 4 NA NA NA NA NA
## 5 NA NA NA NA NA
## 6 NA NA NA NA NA
## nucleus_width nucleus_width_error nucleus_area nucleus_area_error
## 1 NA NA 7.36 NA
## 2 NA NA NA NA
## 3 NA NA 4.10 0.09
## 4 NA NA NA NA
## 5 NA NA NA NA
## 6 NA NA NA NA
## nucleus_volume nucleus_volume_error
## 1 NA NA
## 2 NA NA
## 3 NA NA
## 4 NA NA
## 5 NA NA
## 6 NA NA
## notes
## 1 Hardie, D.C. and P.D.N. Hebert (2003). The nucleotypic effects of cellular DNA content in cartilaginous and ray-finned fishes. Genome 46: 683-706.
## 2 Gulliver, G. (1875). Observations on the sizes and shapes of the red corpuscles of the blood of vertebrates, with drawings of them to a uniform scale, and extended and revised tables of measurements. Proceedings of the Zoological Society of London 1875: 474-495.
## 3 Table 1
## 4 Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico
## 5 Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico
## 6 Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico
## phylum class order family genus
## 1 Chordata Actinopterygii Tetraodontiformes Balistidae Abalistes
## 2 Chordata Actinopterygii Cypriniformes Leuciscidae Abramis
## 3 Chordata Actinopterygii Perciformes Pomacentridae Abudefduf
## 4 Chordata Actinopterygii Perciformes Pomacentridae Abudefduf
## 5 Chordata Actinopterygii Perciformes Pomacentridae Abudefduf
## 6 Chordata Actinopterygii Perciformes Pomacentridae Abudefduf
## species source taxo_level isMarine isBrackish isFresh
## 1 Abalistes stellatus ncbi Species 1 0 0
## 2 Abramis brama ncbi Species 0 1 1
## 3 Abudefduf saxatilis gbif Species 1 0 0
## 4 Abudefduf saxatilis ncbi Species 1 0 0
## 5 Abudefduf saxatilis ncbi Species 1 0 0
## 6 Abudefduf saxatilis ncbi Species 1 0 0
## realm species_underscored
## 1 marine Abalistes_stellatus
## 2 freshwater-brackish Abramis_brama
## 3 marine Abudefduf_saxatilis
## 4 marine Abudefduf_saxatilis
## 5 marine Abudefduf_saxatilis
## 6 marine Abudefduf_saxatilis
Checks following code of Pottier et al. 2021. Sexual (in)equality? A meta-analysis of sex differences in thermal acclimation capacity across ectotherms (also cited in the main text).
kable(summary(dat), "html") %>%
kable_styling("striped", position = "left") %>%
scroll_box(width = "100%", height = "500px")
| species_reported | double_checked | database | key | body_mass_gram | sex | life_stage | lat_dec | long_dec | location_description | sample_size | number_of_specimens | estimate_error_type | cell_length | cell_length_error | cell_width | cell_width_error | cell_area | cell_area_error | cell_volume | cell_volume_error | mcv | mcv_error | nucleus_length | nucleus_length_error | nucleus_width | nucleus_width_error | nucleus_area | nucleus_area_error | nucleus_volume | nucleus_volume_error | notes | phylum | class | order | family | genus | species | source | taxo_level | isMarine | isBrackish | isFresh | realm | species_underscored | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Length:1765 | Length:1765 | Length:1765 | Length:1765 | Min. : 0.51 | both : 98 | adult : 204 | Min. :-42.39 | Min. :-107.31 | Length:1765 | 30 : 30 | 10 : 24 | Length:1765 | Min. : 5.51 | Min. :0.0090 | Min. : 3.740 | Min. : 0.008 | Min. : 16.22 | Min. : 0.120 | Min. : 41.08 | Min. : 1.438 | Min. : 0.019 | Min. : 0.0014 | Min. : 2.240 | Min. :0.0100 | Min. :1.440 | Min. :0.0200 | Min. : 2.56 | Min. : 0.030 | Min. : 3.42 | Min. : 0.100 | Length:1765 | Length:1765 | Length:1765 | Length:1765 | Length:1765 | Length:1765 | Length:1765 | Length:1765 | Length:1765 | Min. :0.0000 | Min. :0.0000 | Min. :0.0000 | Length:1765 | Length:1765 | |
| Class :character | Class :character | Class :character | Class :character | 1st Qu.: 25.68 | female: 47 | fingerlings: 9 | 1st Qu.: 18.22 | 1st Qu.: -66.49 | Class :character | 1 : 25 | 20 : 14 | Class :character | 1st Qu.: 9.50 | 1st Qu.:0.2075 | 1st Qu.: 6.500 | 1st Qu.: 0.220 | 1st Qu.: 58.05 | 1st Qu.: 2.435 | 1st Qu.: 286.00 | 1st Qu.:11.700 | 1st Qu.: 107.025 | 1st Qu.: 2.8162 | 1st Qu.: 4.400 | 1st Qu.:0.2000 | 1st Qu.:3.000 | 1st Qu.:0.1500 | 1st Qu.: 10.84 | 1st Qu.: 0.500 | 1st Qu.: 18.75 | 1st Qu.: 1.173 | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | 1st Qu.:0.0000 | 1st Qu.:0.0000 | 1st Qu.:0.0000 | Class :character | Class :character | |
| Mode :character | Mode :character | Mode :character | Mode :character | Median : 78.10 | male : 41 | juvenile : 72 | Median : 18.22 | Median : -66.49 | Mode :character | 20 : 24 | 30 : 14 | Mode :character | Median :10.00 | Median :0.4150 | Median : 7.500 | Median : 0.340 | Median : 70.38 | Median : 4.250 | Median : 579.52 | Median :23.000 | Median : 150.738 | Median : 11.8000 | Median : 5.250 | Median :0.3200 | Median :3.500 | Median :0.2400 | Median : 14.03 | Median : 1.190 | Median : 39.19 | Median : 2.000 | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Median :1.0000 | Median :0.0000 | Median :0.0000 | Mode :character | Mode :character | |
| NA | NA | NA | NA | Mean : 1802.84 | NA’s :1579 | NA’s :1480 | Mean : 20.39 | Mean : -10.25 | NA | 50 : 24 | 8 : 12 | NA | Mean :11.15 | Mean :0.6021 | Mean : 7.791 | Mean : 2.692 | Mean : 94.99 | Mean : 6.717 | Mean : 569.99 | Mean :26.096 | Mean : 201.688 | Mean : 19.4054 | Mean : 5.427 | Mean :0.3815 | Mean :3.739 | Mean :0.2902 | Mean : 18.91 | Mean : 1.865 | Mean : 70.10 | Mean : 2.625 | NA | NA | NA | NA | NA | NA | NA | NA | NA | Mean :0.6634 | Mean :0.4776 | Mean :0.4593 | NA | NA | |
| NA | NA | NA | NA | 3rd Qu.: 203.12 | NA | NA | 3rd Qu.: 30.06 | 3rd Qu.: 76.95 | NA | 5 : 22 | 1 : 11 | NA | 3rd Qu.:11.80 | 3rd Qu.:0.8575 | 3rd Qu.: 8.408 | 3rd Qu.: 0.690 | 3rd Qu.: 89.19 | 3rd Qu.: 6.700 | 3rd Qu.: 693.54 | 3rd Qu.:37.550 | 3rd Qu.: 204.750 | 3rd Qu.: 24.0250 | 3rd Qu.: 6.120 | 3rd Qu.:0.5000 | 3rd Qu.:4.225 | 3rd Qu.:0.4000 | 3rd Qu.: 19.59 | 3rd Qu.: 2.178 | 3rd Qu.:105.73 | 3rd Qu.: 3.683 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3rd Qu.:1.0000 | 3rd Qu.:1.0000 | 3rd Qu.:1.0000 | NA | NA | |
| NA | NA | NA | NA | Max. :217271.00 | NA | NA | Max. : 60.95 | Max. : 146.49 | NA | (Other): 91 | (Other): 144 | NA | Max. :44.60 | Max. :2.9530 | Max. :27.000 | Max. :259.334 | Max. :944.70 | Max. :66.600 | Max. :1889.41 | Max. :78.890 | Max. :6940.000 | Max. :293.0000 | Max. :17.500 | Max. :1.4260 | Max. :9.750 | Max. :1.2000 | Max. :157.33 | Max. :14.900 | Max. :307.64 | Max. :11.800 | NA | NA | NA | NA | NA | NA | NA | NA | NA | Max. :1.0000 | Max. :1.0000 | Max. :1.0000 | NA | NA | |
| NA | NA | NA | NA | NA’s :1283 | NA | NA | NA’s :525 | NA’s :525 | NA | NA’s :1549 | NA’s :1546 | NA | NA’s :867 | NA’s :1649 | NA’s :867 | NA’s :1648 | NA’s :1028 | NA’s :1671 | NA’s :1650 | NA’s :1724 | NA’s :1463 | NA’s :1501 | NA’s :1598 | NA’s :1660 | NA’s :1598 | NA’s :1660 | NA’s :1307 | NA’s :1651 | NA’s :1598 | NA’s :1697 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA’s :18 | NA’s :21 | NA’s :21 | NA | NA |
In our analysis, we assessed a range of cellular parameters across multiple studies, including cell size, cell volume, nucleus area, nucleus volume, and mean corpuscular volume. By analyzing and comparing the mean, minimum, and maximum values for each metric, I aim to identify potential outliers and determine whether extreme values are predominantly associated with specific studies.
kable(dat %>%
group_by(key) %>%
summarise(mean_cell_area = mean(cell_area), max_cell_area = max(cell_area), min_cell_area = min(cell_area), sd_cell_area = sd(cell_area),
mean_cell_volume = mean(cell_volume), max_cell_volume = max(cell_volume), min_cell_volume = min(cell_volume), sd_cell_volume = sd(cell_volume),
mean_nucleus_area = mean(nucleus_area), max_nucleus_area = max(nucleus_area), min_nucleus_area = min(nucleus_area), sd_nucleus_area = sd(nucleus_area),
mean_nucleus_volume = mean(nucleus_volume), max_nucleus_volume= max(nucleus_volume), min_nucleus_volume = min(nucleus_volume), sd_nucleus_volume = sd(nucleus_volume),
mean_mcv = mean(mcv), max_mcv = max(mcv), min_mcv = min(mcv), sd_mcv = sd(mcv),
n = n())) %>%
kable_styling("striped", position = "left") %>%
scroll_box(width = "100%", height = "500px")
| key | mean_cell_area | max_cell_area | min_cell_area | sd_cell_area | mean_cell_volume | max_cell_volume | min_cell_volume | sd_cell_volume | mean_nucleus_area | max_nucleus_area | min_nucleus_area | sd_nucleus_area | mean_nucleus_volume | max_nucleus_volume | min_nucleus_volume | sd_nucleus_volume | mean_mcv | max_mcv | min_mcv | sd_mcv | n |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11_fpl | 73.20000 | 73.20000 | 73.20000 | NA | NA | NA | NA | NA | 13.300000 | 13.300000 | 13.300000 | NA | NA | NA | NA | NA | 163.80000 | 163.80000 | 163.80000 | NA | 1 |
| 12_fpl | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 294.95000 | 529.50000 | 139.70000 | 156.4641013 | 6 |
| 13_gregory | 123.48071 | 303.98000 | 37.70000 | 93.6114967 | NA | NA | NA | NA | 18.417143 | 42.190000 | 7.760000 | 11.094015 | NA | NA | NA | NA | NA | NA | NA | NA | 14 |
| 14_fpl | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| 15_fpl | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3 |
| 16_fpl | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 601 |
| 17_gregory | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 12 |
| 18_fpl | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 |
| 1_fpl | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| 2_fpl | 75.60000 | 75.60000 | 75.60000 | NA | NA | NA | NA | NA | 12.000000 | 12.000000 | 12.000000 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| 3_gregory | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 13 |
| 5_fpl | NA | NA | NA | NA | NA | NA | NA | NA | 19.060000 | 21.600000 | 10.500000 | 3.266054 | NA | NA | NA | NA | NA | NA | NA | NA | 10 |
| 6_gregory | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 7 |
| 9_gregory | 99.62252 | 639.02000 | 32.34000 | 102.7547647 | NA | NA | NA | NA | 21.001261 | 157.330000 | 5.460000 | 24.679091 | NA | NA | NA | NA | NA | NA | NA | NA | 222 |
| pdf_not_found_Gulliver_1875 | 127.30679 | 944.70000 | 42.80000 | 144.6876504 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 81 |
| pdf_not_found_Kisch_1949a | NA | NA | NA | NA | NA | NA | NA | NA | 19.067273 | 38.170000 | 7.480000 | 11.493088 | NA | NA | NA | NA | NA | NA | NA | NA | 11 |
| pdf_not_found_Kisch_1949b | 123.51750 | 245.26000 | 64.87000 | 83.0318157 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 |
| pdf_not_found_Kisch_1951 | 207.45000 | 542.99000 | 75.76000 | 174.4467711 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 6 |
| pdf_not_found_Potter_et_al_1982 | 117.18000 | 128.68000 | 105.68000 | 16.2634560 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 |
| pdf_not_found_Wintrobe_1933 | 138.54600 | 390.34000 | 51.44000 | 111.6304454 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 20 |
| rayyan-33184974 | 63.63636 | 73.47000 | 44.88000 | 7.3568126 | NA | NA | NA | NA | 21.515455 | 29.170000 | 13.960000 | 4.179964 | NA | NA | NA | NA | NA | NA | NA | NA | 22 |
| rayyan-33185067 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 134.14000 | 139.04000 | 129.24000 | 6.9296465 | 2 |
| rayyan-33185224 | 98.87500 | 102.96000 | 94.79000 | 5.7770624 | NA | NA | NA | NA | 17.035000 | 18.850000 | 15.220000 | 2.566798 | NA | NA | NA | NA | NA | NA | NA | NA | 2 |
| rayyan-33185285 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33185655 | NA | NA | NA | NA | 266.0000 | 266.0000 | 266.0000 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33185673 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 74.60000 | 74.60000 | 74.60000 | NA | 1 |
| rayyan-33185780 | 89.36252 | 186.04800 | 36.25600 | 28.4751755 | 719.8610 | 1889.4072 | 165.7881 | 343.10499 | 15.822703 | 30.007000 | 6.350200 | 4.880071 | 128.556361 | 307.6396 | 28.29905 | 62.970993 | NA | NA | NA | NA | 74 |
| rayyan-33185781 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 153.57500 | 317.90000 | 75.00000 | 46.8768469 | 52 |
| rayyan-33185785 | NA | NA | NA | NA | 268.0000 | 459.0000 | 157.0000 | 94.00473 | NA | NA | NA | NA | 46.600000 | 91.0000 | 24.00000 | 19.351715 | NA | NA | NA | NA | 10 |
| rayyan-33185798 | 78.38389 | 121.23045 | 46.37760 | 36.0588349 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 |
| rayyan-33185801 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33185803 | 265.90000 | 668.70000 | 127.00000 | 227.2101890 | 216.3000 | 728.3000 | 54.8000 | 287.70053 | 41.420000 | 76.200000 | 29.800000 | 19.663342 | 22.960000 | 58.1000 | 11.20000 | 19.839430 | NA | NA | NA | NA | 5 |
| rayyan-33185807 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 490.00000 | 490.00000 | 490.00000 | NA | 1 |
| rayyan-33185817 | 78.79500 | 110.50000 | 56.33000 | 18.6703133 | 445.2200 | 695.1800 | 268.3000 | 160.72851 | 11.318750 | 19.130000 | 8.360000 | 3.427921 | 23.223750 | 46.3400 | 13.63000 | 10.103183 | NA | NA | NA | NA | 8 |
| rayyan-33185819 | 257.70218 | 257.70218 | 257.70218 | NA | 260.7721 | 260.7721 | 260.7721 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33185820 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 28.700000 | 30.3000 | 27.10000 | 2.262742 | NA | NA | NA | NA | 2 |
| rayyan-33185881 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 132.20000 | 132.20000 | 132.20000 | NA | 1 |
| rayyan-33185890 | 170.90000 | 170.90000 | 170.90000 | NA | NA | NA | NA | NA | 37.000000 | 37.000000 | 37.000000 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33185896 | 89.36000 | 109.02000 | 76.65000 | 14.2078077 | NA | NA | NA | NA | 17.317500 | 19.440000 | 13.970000 | 2.377749 | NA | NA | NA | NA | NA | NA | NA | NA | 4 |
| rayyan-33185905 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 139.82000 | 139.82000 | 139.82000 | NA | 1 |
| rayyan-33185912 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 9 |
| rayyan-33185919 | 43.94286 | 51.90000 | 34.30000 | 6.4562262 | 186.2286 | 225.6000 | 131.4000 | 33.57766 | 6.042857 | 7.700000 | 4.600000 | 1.357519 | 9.057143 | 12.2000 | 6.40000 | 2.330849 | NA | NA | NA | NA | 7 |
| rayyan-33185922 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 |
| rayyan-33185928 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 121.65500 | 151.20000 | 92.11000 | 41.7829397 | 2 |
| rayyan-33185981 | NA | NA | NA | NA | NA | NA | NA | NA | 15.200000 | 15.200000 | 15.200000 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33185991 | 78.68650 | 83.67400 | 72.53900 | 4.8649538 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 |
| rayyan-33186081 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 406.00000 | 406.00000 | 406.00000 | NA | 1 |
| rayyan-33186083 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 183.64000 | 183.64000 | 183.64000 | NA | 1 |
| rayyan-33186084 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 100.19000 | 100.19000 | 100.19000 | NA | 1 |
| rayyan-33186095 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 |
| rayyan-33186100 | NA | NA | NA | NA | NA | NA | NA | NA | 4.706667 | 7.100000 | 2.800000 | 1.275856 | NA | NA | NA | NA | NA | NA | NA | NA | 15 |
| rayyan-33186105 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 397.26315 | 418.87720 | 375.64910 | 30.5668826 | 2 |
| rayyan-33186107 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33186108 | 336.50000 | 348.00000 | 328.00000 | 9.2209905 | NA | NA | NA | NA | 61.800000 | 64.700000 | 58.700000 | 2.741046 | NA | NA | NA | NA | NA | NA | NA | NA | 4 |
| rayyan-33186111 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 156.00000 | 156.00000 | 156.00000 | NA | 1 |
| rayyan-33186112 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33186116 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 |
| rayyan-33186120 | 54.70000 | 54.70000 | 54.70000 | NA | NA | NA | NA | NA | 9.080000 | 9.080000 | 9.080000 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33186121 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 151.10000 | 173.30000 | 128.90000 | 31.3955411 | 2 |
| rayyan-33186182 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3 |
| rayyan-33186189 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 72.57500 | 74.69000 | 70.46000 | 2.9910617 | 2 |
| rayyan-33186200 | 65.68500 | 74.16000 | 57.21000 | 11.9854599 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 |
| rayyan-33186205 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 181.47000 | 181.47000 | 181.47000 | NA | 1 |
| rayyan-33186206 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 122.38000 | 132.72000 | 114.48000 | 7.6032318 | 4 |
| rayyan-33186208 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 108.11000 | 108.11000 | 108.11000 | NA | 1 |
| rayyan-33186211 | NA | NA | NA | NA | NA | NA | NA | NA | 5.357174 | 7.721201 | 3.158679 | 1.588524 | NA | NA | NA | NA | NA | NA | NA | NA | 7 |
| rayyan-33186212 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 181.56000 | 181.56000 | 181.56000 | NA | 1 |
| rayyan-33186219 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 115.40000 | 115.40000 | 115.40000 | NA | 1 |
| rayyan-33186224 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 141.05000 | 155.89000 | 126.21000 | 20.9869293 | 2 |
| rayyan-33186225 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 228.60000 | 242.00000 | 215.20000 | 18.9504617 | 2 |
| rayyan-33186227 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 302.20000 | 302.20000 | 302.20000 | NA | 1 |
| rayyan-33186280 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 224.07333 | 270.06000 | 197.26000 | 40.0084058 | 3 |
| rayyan-33186286 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 92.80000 | 93.00000 | 92.50000 | 0.2645751 | 3 |
| rayyan-33186290 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 75.10000 | 75.10000 | 75.10000 | NA | 1 |
| rayyan-33186293 | 16.22000 | 16.22000 | 16.22000 | NA | 41.0800 | 41.0800 | 41.0800 | NA | 2.560000 | 2.560000 | 2.560000 | NA | 3.420000 | 3.4200 | 3.42000 | NA | 307.24000 | 307.24000 | 307.24000 | NA | 1 |
| rayyan-33186294 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 229.76000 | 229.76000 | 229.76000 | NA | 1 |
| rayyan-33186297 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 621.20700 | 621.20700 | 621.20700 | NA | 1 |
| rayyan-33186305 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 6940.00000 | 6940.00000 | 6940.00000 | NA | 1 |
| rayyan-33186308 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 153.01667 | 170.20000 | 138.60000 | 13.9341906 | 6 |
| rayyan-33186313 | 63.11977 | 78.63471 | 44.32959 | 8.2134960 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 75 |
| rayyan-33186317 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 177.60000 | 177.60000 | 177.60000 | NA | 1 |
| rayyan-33186325 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 272.16333 | 291.76000 | 259.73000 | 17.1745519 | 3 |
| rayyan-33186327 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 391.48000 | 391.48000 | 391.48000 | NA | 1 |
| rayyan-33186328 | 105.35875 | 118.32000 | 92.52000 | 11.2868209 | NA | NA | NA | NA | 15.912500 | 21.010000 | 10.480000 | 4.356931 | NA | NA | NA | NA | NA | NA | NA | NA | 8 |
| rayyan-33186379 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 191.20000 | 191.20000 | 191.20000 | NA | 1 |
| rayyan-33186384 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 97.53000 | 97.53000 | 97.53000 | NA | 1 |
| rayyan-33186396 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 98.30000 | 98.30000 | 98.30000 | NA | 1 |
| rayyan-33186404 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 0.01912 | 0.01912 | 0.01912 | NA | 1 |
| rayyan-33186421 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 59.57516 | 89.84954 | 27.85492 | 21.5825675 | 10 |
| rayyan-33186423 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 158.33750 | 198.17000 | 136.98000 | 28.4639332 | 4 |
| rayyan-33186424 | 59.60000 | 59.60000 | 59.60000 | NA | 296.3000 | 296.3000 | 296.3000 | NA | 11.000000 | 11.000000 | 11.000000 | NA | 23.000000 | 23.0000 | 23.00000 | NA | NA | NA | NA | NA | 1 |
| rayyan-33186479 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 154.50000 | 193.30000 | 120.40000 | 34.3531658 | 4 |
| rayyan-33186484 | NA | NA | NA | NA | 421.5600 | 421.5600 | 421.5600 | NA | NA | NA | NA | NA | 19.850000 | 19.8500 | 19.85000 | NA | NA | NA | NA | NA | 1 |
| rayyan-33186485 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 362.50000 | 372.00000 | 353.00000 | 13.4350288 | 2 |
| rayyan-33186492 | 83.80000 | 83.80000 | 83.80000 | NA | 439.0000 | 439.0000 | 439.0000 | NA | 16.800000 | 16.800000 | 16.800000 | NA | 40.360000 | 40.3600 | 40.36000 | NA | NA | NA | NA | NA | 1 |
| rayyan-33186493 | NA | NA | NA | NA | 228.7600 | 265.8100 | 191.7100 | 52.39661 | NA | NA | NA | NA | 9.995000 | 11.1800 | 8.81000 | 1.675843 | NA | NA | NA | NA | 2 |
| rayyan-33186496 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 4 |
| rayyan-33186506 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 341.87500 | 444.00000 | 257.00000 | 55.3003165 | 8 |
| rayyan-33186518 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 178.28000 | 178.28000 | 178.28000 | NA | 1 |
| rayyan-33186519 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33186584 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 115.40000 | 115.40000 | 115.40000 | NA | 1 |
| rayyan-33186596 | 75.48500 | 80.65000 | 70.32000 | 7.3044130 | NA | NA | NA | NA | 12.075000 | 13.100000 | 11.050000 | 1.449569 | NA | NA | NA | NA | 186.56500 | 206.99000 | 166.14000 | 28.8853120 | 2 |
| rayyan-33186620 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 131.45000 | 190.40000 | 71.54000 | 65.6046269 | 4 |
| rayyan-33186621 | 71.90906 | 90.67934 | 48.57124 | 7.7024691 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 60 |
| rayyan-33186625 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 243.08000 | 261.16000 | 225.46000 | 14.8105300 | 5 |
| rayyan-33186628 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33186684 | 400.00000 | 400.00000 | 400.00000 | NA | 280.0000 | 280.0000 | 280.0000 | NA | NA | NA | NA | NA | 29.700000 | 29.7000 | 29.70000 | NA | NA | NA | NA | NA | 1 |
| rayyan-33186685 | 57.40353 | 68.89622 | 48.26075 | 4.7742315 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 59 |
| rayyan-33186690 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 122.28000 | 122.28000 | 122.28000 | NA | 1 |
| rayyan-33186704 | 87.00000 | 87.00000 | 87.00000 | NA | 439.1000 | 439.1000 | 439.1000 | NA | 14.900000 | 14.900000 | 14.900000 | NA | 13.300000 | 13.3000 | 13.30000 | NA | NA | NA | NA | NA | 1 |
| rayyan-33186789 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 266.69604 | 266.69604 | 266.69604 | NA | 1 |
| rayyan-33186801 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 168.11500 | 218.19000 | 139.34000 | 22.2362668 | 10 |
| rayyan-33186803 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 156.19608 | 156.19608 | 156.19608 | NA | 1 |
| rayyan-33186823 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 129.50000 | 161.90000 | 109.40000 | 22.5983775 | 4 |
| rayyan-33186891 | 85.87895 | 122.10000 | 50.80000 | 17.1464794 | NA | NA | NA | NA | 14.915790 | 22.800000 | 8.200000 | 4.493732 | NA | NA | NA | NA | NA | NA | NA | NA | 19 |
| rayyan-33186984 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 191.40000 | 191.40000 | 191.40000 | NA | 1 |
| rayyan-33186990 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 184.60000 | 197.30000 | 171.90000 | 17.9605122 | 2 |
| rayyan-33187004 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 139.00000 | 139.00000 | 139.00000 | NA | 1 |
| rayyan-33187005 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 179.73000 | 179.73000 | 179.73000 | NA | 1 |
| rayyan-33187009 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 38.82000 | 38.82000 | 38.82000 | NA | 1 |
| rayyan-33187015 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33187026 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 145.09000 | 150.71000 | 141.02000 | 3.4772288 | 6 |
| rayyan-33187027 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 81.25000 | 81.25000 | 81.25000 | NA | 1 |
| rayyan-33187079 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 190.25000 | 236.00000 | 160.00000 | 23.5356872 | 8 |
| rayyan-33187080 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 41.52199 | 46.99074 | 34.02778 | 5.4298517 | 4 |
| rayyan-33187087 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 324.00000 | 324.00000 | 324.00000 | NA | 1 |
| rayyan-33187103 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 152.61454 | 155.20987 | 150.02469 | 2.0745338 | 9 |
| rayyan-33187114 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 106.20000 | 110.80000 | 104.00000 | 3.1198291 | 4 |
| rayyan-33187116 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 255.60667 | 274.04000 | 243.92000 | 16.1536910 | 3 |
| rayyan-33187182 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 544.80000 | 754.00000 | 335.60000 | 295.8534772 | 2 |
| rayyan-33187198 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 546.35333 | 765.03000 | 407.78000 | 191.6228030 | 3 |
| rayyan-33187201 | NA | NA | NA | NA | 750.3900 | 750.3900 | 750.3900 | NA | NA | NA | NA | NA | 48.450000 | 48.4500 | 48.45000 | NA | NA | NA | NA | NA | 1 |
| rayyan-33187210 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 259.21500 | 276.27000 | 242.16000 | 24.1194123 | 2 |
| rayyan-33187218 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 107.70000 | 107.70000 | 107.70000 | NA | 1 |
| rayyan-33187284 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 212.47000 | 212.47000 | 212.47000 | NA | 1 |
| rayyan-33187321 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 259.00000 | 259.00000 | 259.00000 | NA | 1 |
| rayyan-33187323 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 195.15000 | 218.30000 | 172.00000 | 32.7390440 | 2 |
| rayyan-33187326 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 |
| rayyan-33187379 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 396.00000 | 396.00000 | 396.00000 | NA | 1 |
| rayyan-33187386 | 91.72000 | 92.20000 | 91.24000 | 0.6788225 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 |
| rayyan-33187393 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 25.000000 | 25.0000 | 25.00000 | NA | NA | NA | NA | NA | 1 |
| rayyan-33187394 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 253.20000 | 253.20000 | 253.20000 | NA | 1 |
| rayyan-33187419 | 38.22502 | 47.19851 | 28.38398 | 9.7233924 | NA | NA | NA | NA | 12.733002 | 17.837841 | 6.787559 | 4.823711 | NA | NA | NA | NA | NA | NA | NA | NA | 4 |
| rayyan-33187421 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 241.00000 | 241.00000 | 241.00000 | NA | 1 |
| rayyan-33187502 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 2 |
| rayyan-33187593 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 48.25000 | 48.25000 | 48.25000 | NA | 1 |
| rayyan-33187601 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 36.15000 | 36.90000 | 35.40000 | 1.0606602 | 2 |
| rayyan-33187610 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 104.47250 | 121.97000 | 95.04000 | 12.6076680 | 4 |
| rayyan-33187617 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 118.70000 | 118.70000 | 118.70000 | NA | 1 |
| rayyan-33187620 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 125.02000 | 125.02000 | 125.02000 | NA | 1 |
| rayyan-33187626 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 62.15736 | 81.36067 | 52.50532 | 13.0214582 | 4 |
| rayyan-33187693 | 94.70000 | 94.70000 | 94.70000 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 85.000000 | 85.0000 | 85.00000 | NA | NA | NA | NA | NA | 1 |
| rayyan-33187708 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 361.00000 | 363.20000 | 358.80000 | 3.1112698 | 2 |
| rayyan-33187720 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 111.45000 | 113.30000 | 109.60000 | 2.6162951 | 2 |
| rayyan-33187788 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 151.08000 | 151.08000 | 151.08000 | NA | 1 |
| rayyan-33189096 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33189101 | 68.38000 | 68.38000 | 68.38000 | NA | NA | NA | NA | NA | 11.350000 | 11.350000 | 11.350000 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33189111 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 180.07500 | 187.81000 | 172.34000 | 10.9389419 | 2 |
| rayyan-33189116 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 385.00000 | 385.00000 | 385.00000 | NA | 1 |
| rayyan-33189128 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 116.36000 | 116.36000 | 116.36000 | NA | 1 |
| rayyan-33189137 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 64.83871 | 64.83871 | 64.83871 | NA | 1 |
| rayyan-33189143 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 62.97000 | 62.97000 | 62.97000 | NA | 1 |
| rayyan-33189147 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 100.05833 | 105.50000 | 93.30000 | 3.1454031 | 12 |
| rayyan-33189201 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 186.79000 | 186.79000 | 186.79000 | NA | 1 |
| rayyan-33189260 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 61.17500 | 65.53000 | 57.42000 | 3.6778844 | 4 |
| rayyan-33189310 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 60.00000 | 60.00000 | 60.00000 | NA | 1 |
| rayyan-33189386 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 82.04000 | 82.04000 | 82.04000 | NA | 1 |
| rayyan-33189505 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 141.13000 | 141.13000 | 141.13000 | NA | 1 |
| rayyan-33189548 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 120.00000 | 120.00000 | 120.00000 | NA | 1 |
| rayyan-33189575 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 112.00000 | 112.00000 | 112.00000 | NA | 1 |
| rayyan-33189578 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 81.60000 | 81.60000 | 81.60000 | NA | 1 |
| rayyan-33189584 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 86.36000 | 86.36000 | 86.36000 | NA | 1 |
| rayyan-33189611 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 160.30000 | 160.30000 | 160.30000 | NA | 1 |
| rayyan-33189616 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1537.70000 | 1537.70000 | 1537.70000 | NA | 1 |
| rayyan-33189628 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 14.54500 | 14.70000 | 14.39000 | 0.2192031 | 2 |
| rayyan-33189689 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 440.00000 | 460.00000 | 410.00000 | 26.4575131 | 3 |
| rayyan-33189783 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 254.92000 | 254.92000 | 254.92000 | NA | 1 |
| rayyan-33189793 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 174.47496 | 178.02908 | 170.92084 | 5.0262847 | 2 |
| rayyan-33189812 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 129.90000 | 129.90000 | 129.90000 | NA | 1 |
| rayyan-33189873 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 106.37000 | 106.37000 | 106.37000 | NA | 1 |
| rayyan-33197791 | 89.64000 | 89.64000 | 89.64000 | NA | NA | NA | NA | NA | 8.580000 | 8.580000 | 8.580000 | NA | NA | NA | NA | NA | NA | NA | NA | NA | 1 |
| rayyan-33197855 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 132.84000 | 132.84000 | 132.84000 | NA | 1 |
| rayyan-33199533 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 130.74000 | 130.74000 | 130.74000 | NA | 1 |
| rayyan-36980058 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 148.20000 | 182.30000 | 122.20000 | 30.8579001 | 3 |
| rayyan-37034231 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 244.08333 | 294.14000 | 179.83000 | 58.4624070 | 3 |
| rayyan-37034386 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 139.88000 | 139.88000 | 139.88000 | NA | 1 |
| rayyan-37038224 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 355.00000 | 416.00000 | 294.00000 | 86.2670273 | 2 |
plot_histogram(dat)
I conducted an analysis for each cell size trait. By examining frequency distribution histograms, I established distribution thresholds for these variables. These thresholds were set somewhat arbitrarily for each trait but allowed for manual inspection of any values below or above them. In cases where an error was detected, I referred back to the original source and rechecked the data thoroughly.
dat %>%
filter(is.finite(cell_area)) %>%
ggplot(aes(log10(cell_area))) +
geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
theme_classic() +
labs(title = "Cell Area")
dat %>%
filter(is.finite(cell_area)) %>%
mutate(log10_cell_area = log10(cell_area)) %>% # Calculate log10(cell_area)
filter(log10_cell_area < 1.3 | log10_cell_area > 2.9) %>% # Apply thresholds, based on data distribution
select(key, class, species_reported, log10_cell_area, cell_area) %>% # Select relevant columns
arrange(key, cell_area)
## key class species_reported
## 1 pdf_not_found_Gulliver_1875 Dipnoi Protopterus annectens annectens
## 2 rayyan-33186293 Actinopterygii Iranocichla hormuzensis
## log10_cell_area cell_area
## 1 2.975294 944.70
## 2 1.210051 16.22
ggplot(dat %>% filter(is.finite(cell_volume)), aes(log10(cell_volume))) +
geom_histogram(fill = "firebrick", color = "black", binwidth = 0.05) +
theme_classic() +
labs(title = "Cell Volume")
dat %>%
filter(is.finite(cell_volume)) %>%
mutate(log10_cell_volume = log10(cell_volume)) %>% # Calculate log10(cell_volume)
filter(log10_cell_volume < 2 | log10_cell_volume > 3) %>% # Apply thresholds, based on data distribution
select(key, class, species_reported, log10_cell_volume, cell_volume) %>% # Select relevant columns
arrange(key, cell_volume)
## key class species_reported log10_cell_volume
## 1 rayyan-33185780 Actinopterygii Pygocentrus nattereri 3.066334
## 2 rayyan-33185780 Actinopterygii Pygocentrus nattereri 3.080239
## 3 rayyan-33185780 Actinopterygii Pygocentrus nattereri 3.106438
## 4 rayyan-33185780 Actinopterygii Pygocentrus nattereri 3.116222
## 5 rayyan-33185780 Actinopterygii Synbranchus marmoratus 3.139796
## 6 rayyan-33185780 Actinopterygii Pygocentrus nattereri 3.191964
## 7 rayyan-33185780 Actinopterygii Synbranchus marmoratus 3.196839
## 8 rayyan-33185780 Actinopterygii Synbranchus marmoratus 3.201626
## 9 rayyan-33185780 Actinopterygii Synbranchus marmoratus 3.236037
## 10 rayyan-33185780 Actinopterygii Synbranchus marmoratus 3.276326
## 11 rayyan-33185803 Actinopterygii Crenilabrus tinca 1.738781
## 12 rayyan-33185803 Actinopterygii Uranoscopus scaber 1.903090
## 13 rayyan-33185803 Actinopterygii Gaidropsarus mediterraneus 1.920645
## 14 rayyan-33186293 Actinopterygii Iranocichla hormuzensis 1.613630
## cell_volume
## 1 1165.023
## 2 1202.926
## 3 1277.725
## 4 1306.839
## 5 1379.736
## 6 1555.835
## 7 1573.400
## 8 1590.838
## 9 1722.017
## 10 1889.407
## 11 54.800
## 12 80.000
## 13 83.300
## 14 41.080
dat %>%
filter(is.finite(nucleus_area)) %>%
ggplot(aes(log10(nucleus_area))) +
geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
theme_classic() +
labs(title = "Nucleus Area")
dat %>%
filter(is.finite(nucleus_area)) %>%
mutate(log10_nucleus_area = log10(nucleus_area)) %>% # Calculate log10(nucleus_area)
filter(log10_nucleus_area < 0.6 | log10_nucleus_area > 2) %>% # Apply thresholds, based on data distribution
select(key, class, species_reported, log10_nucleus_area, nucleus_area) %>% # Select relevant columns
arrange(key, nucleus_area)
## key class species_reported
## 1 9_gregory Chondrichthyes Oxynotus bruniensis
## 2 9_gregory Chondrichthyes Centroscymnus owstoni
## 3 9_gregory Chondrichthyes Centroscymnus coelolepis
## 4 9_gregory Chondrichthyes Etmopterus granulosus
## 5 9_gregory Chondrichthyes Squatina australis
## 6 9_gregory Chondrichthyes Etmopterus brachyurus
## 7 9_gregory Chondrichthyes Centroscymnus crepidater
## 8 9_gregory Chondrichthyes Centroscymnus plunketi
## 9 pdf_not_found_Gulliver_1875 Dipnoi Protopterus annectens annectens
## 10 rayyan-33186100 Actinopterygii Upeneus maculatus
## 11 rayyan-33186100 Actinopterygii Balistes capriscus
## 12 rayyan-33186100 Actinopterygii Bathystoma aurilineatum
## 13 rayyan-33186100 Actinopterygii Calamus calamus
## 14 rayyan-33186211 Actinopterygii Micropterus coosae
## 15 rayyan-33186211 Actinopterygii Perca flavescens
## 16 rayyan-33186293 Actinopterygii Iranocichla hormuzensis
## log10_nucleus_area nucleus_area
## 1 2.0440692 110.680000
## 2 2.0936668 124.070000
## 3 2.1041114 127.090000
## 4 2.1285931 134.460000
## 5 2.1537844 142.490000
## 6 2.1612482 144.960000
## 7 2.1689981 147.570000
## 8 2.1968115 157.330000
## 9 2.0795068 120.090000
## 10 0.4471580 2.800000
## 11 0.5185139 3.300000
## 12 0.5314789 3.400000
## 13 0.5797836 3.800000
## 14 0.4995055 3.158679
## 15 0.5575878 3.610670
## 16 0.4082400 2.560000
dat %>%
filter(is.finite(nucleus_volume)) %>%
ggplot(aes(log10(nucleus_volume))) +
geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
theme_classic() +
labs(title = "Nucleus Volume")
dat %>%
filter(is.finite(nucleus_volume)) %>%
mutate(log10_nucleus_volume = log10(nucleus_volume)) %>% # Calculate log10(nucleus_volume)
filter(log10_nucleus_volume < 0.8 | log10_nucleus_volume > 2.5) %>% # Apply thresholds, based on data distribution (p2)
select(key, class, species_reported, log10_nucleus_volume, nucleus_volume) %>% # Select relevant columns
arrange(key, nucleus_volume)
## key class species_reported log10_nucleus_volume
## 1 rayyan-33186293 Actinopterygii Iranocichla hormuzensis 0.5340261
## nucleus_volume
## 1 3.42
dat %>%
filter(is.finite(cell_length)) %>%
ggplot(aes(log10(cell_length))) +
geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
theme_classic() +
labs(title = "Cell Length")
dat %>%
filter(is.finite(cell_length)) %>%
mutate(log10_cell_length = log10(cell_length)) %>% # Calculate log10(cell_length)
filter(log10_cell_length < 0.8 | log10_cell_length > 1.5) %>% # Apply thresholds, based on data distribution
select(key, location_description, class, species_reported, log10_cell_length, cell_length) %>% # Select relevant columns
arrange(key,location_description, cell_length)
## key location_description
## 1 3_gregory <NA>
## 2 pdf_not_found_Gulliver_1875 <NA>
## 3 pdf_not_found_Gulliver_1875 <NA>
## 4 pdf_not_found_Gulliver_1875 <NA>
## 5 rayyan-33186293 Mehran river, Iran
## 6 rayyan-33186496 Libong Island, Trang Province, Thailand
## 7 rayyan-33186496 Rajamangala Beach, Trang Province, Thailand
## class species_reported log10_cell_length cell_length
## 1 Dipnoi Ceratodus forsteri 1.5910646 39.00
## 2 Chondrichthyes Oxynotus centrina 1.5024271 31.80
## 3 Chondrichthyes Torpedo torpedo 1.5024271 31.80
## 4 Dipnoi Protopterus annectens annectens 1.6493349 44.60
## 5 Actinopterygii Iranocichla hormuzensis 0.7411516 5.51
## 6 Actinopterygii Gerres filamentosus 0.7888751 6.15
## 7 Actinopterygii Leiognathus decorus 0.7972675 6.27
dat %>%
filter(is.finite(cell_width)) %>%
ggplot(aes(log10(cell_width))) +
geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
theme_classic() +
labs(title = "Cell Width")
dat %>%
filter(is.finite(cell_width)) %>%
mutate(log10_cell_width = log10(cell_width)) %>% # Calculate log10(cell_width)
filter(log10_cell_width < 0.6 | log10_cell_width > 1.3) %>% # Apply thresholds, based on data distribution
select(key, class, species_reported, log10_cell_width, cell_width) %>% # Select relevant columns
arrange(key, cell_width)
## key class species_reported
## 1 3_gregory Dipnoi Ceratodus forsteri
## 2 pdf_not_found_Gulliver_1875 Chondrichthyes Oxynotus centrina
## 3 pdf_not_found_Gulliver_1875 Chondrichthyes Torpedo torpedo
## 4 pdf_not_found_Gulliver_1875 Dipnoi Protopterus annectens annectens
## 5 pdf_not_found_Kisch_1951 Chondrichthyes Torpedo nobiliana
## 6 rayyan-33186293 Actinopterygii Iranocichla hormuzensis
## log10_cell_width cell_width
## 1 1.3802112 24.00
## 2 1.4048337 25.40
## 3 1.4048337 25.40
## 4 1.4313638 27.00
## 5 1.3654880 23.20
## 6 0.5728716 3.74
dat %>%
filter(is.finite(nucleus_length)) %>%
ggplot(aes(log10(nucleus_length))) +
geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
theme_classic() +
labs(title = "Nucleus Length")
dat %>%
filter(is.finite(nucleus_length)) %>%
mutate(log10_nucleus_length = log10(nucleus_length)) %>% # Calculate log10(nucleus_length)
filter(log10_nucleus_length < 0.5 | log10_nucleus_length > 1) %>% # Apply thresholds, based on data distribution
select(key, class, species_reported, log10_nucleus_length, nucleus_length) %>% # Select relevant columns
arrange(key, nucleus_length)
## key class species_reported
## 1 3_gregory Dipnoi Ceratodus forsteri
## 2 pdf_not_found_Gulliver_1875 Dipnoi Protopterus annectens annectens
## 3 rayyan-33186293 Actinopterygii Iranocichla hormuzensis
## 4 rayyan-33186496 Actinopterygii Leiognathus decorus
## log10_nucleus_length nucleus_length
## 1 1.1461280 14.00
## 2 1.2430380 17.50
## 3 0.3502480 2.24
## 4 0.4955443 3.13
dat %>%
filter(is.finite(nucleus_width)) %>%
ggplot(aes(log10(nucleus_width))) +
geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
theme_classic() +
labs(title = "Nucleus Width")
dat %>%
filter(is.finite(nucleus_width)) %>%
mutate(log10_nucleus_width = log10(nucleus_width)) %>% # Calculate log10(nucleus_width)
filter(log10_nucleus_width < 0.25 | log10_nucleus_width > 0.85) %>% # Apply thresholds, based on data distribution
select(key, class, species_reported, log10_nucleus_width, nucleus_width) %>% # Select relevant columns
arrange(key, nucleus_width)
## key class species_reported
## 1 3_gregory Dipnoi Ceratodus forsteri
## 2 pdf_not_found_Gulliver_1875 Dipnoi Protopterus annectens annectens
## 3 rayyan-33186293 Actinopterygii Iranocichla hormuzensis
## log10_nucleus_width nucleus_width
## 1 0.9890046 9.75
## 2 0.9444827 8.80
## 3 0.1583625 1.44
dat %>%
filter(is.finite(mcv)) %>%
ggplot(aes(log10(mcv))) +
geom_histogram(fill = "firebrick", col = "black", binwidth = 0.1) +
theme_classic() +
labs(title = "Mean Corpuscular Volume")
dat %>%
filter(is.finite(mcv)) %>%
mutate(log10_mcv = log10(mcv)) %>% # Calculate log10(mcv)
filter(log10_mcv < 1 | log10_mcv > 3) %>% # Apply thresholds, based on data distribution
select(key, class, species_reported, log10_mcv, mcv) %>% # Select relevant columns
arrange(key, mcv)
## key class species_reported log10_mcv mcv
## 1 rayyan-33186305 Dipnoi Protopterus aethiopicus 3.841359 6940.00000
## 2 rayyan-33186404 Actinopterygii Heterotis niloticus -1.718512 0.01912
## 3 rayyan-33189616 Chondrichthyes Scyliorhinus canicula 3.186872 1537.70000
For a particular study (rayyan-33186404), I observed that the decimal notation (full stops versus commas) was inconsistent. Consequently, I decided to calculate the MCV based on the haematocrit and erythrocyte count values using the formula described in the main text of our manuscript. Employing this formula yielded an approximate value of 187.1622 μm³. I also replaced the error of MCV by NA.
dat <- dat %>%
mutate(
mcv = ifelse(species_reported == "Heterotis niloticus" & key == "rayyan-33186404", 187.1622, mcv),
mcv_error = ifelse(species_reported == "Heterotis niloticus" & key == "rayyan-33186404", NA, mcv_error)
)
Upon making this adjustment, I observed that two distinct studies in the database reported the same MCV value. I noticed this because it was the similar group of authors, and the mean of body mass of the fish was identical. This corroborated the issues with the decimal places. So the correct value of MCV is 191.2.
dat %>%
filter(species_reported == "Heterotis niloticus") %>%
select(key, species_reported,body_mass_gram, mcv, mcv_error)
## key species_reported body_mass_gram mcv mcv_error
## 1 rayyan-33186379 Heterotis niloticus 429.4 191.2000 13.74
## 2 rayyan-33186404 Heterotis niloticus 429.4 187.1622 NA
As a result of this, the study containing the error was excluded from the database. In Figure 1, this study will be labelled under “RBC expressed in wrong units”.
# Lets exclude that study
dat <- dat %>%
filter(key != "rayyan-33186404")
# Lets check again the data on MCV
dat %>%
filter(is.finite(mcv)) %>%
ggplot(aes(log10(mcv))) +
geom_histogram(fill = "firebrick", col = "black", binwidth = 0.1) +
theme_classic() +
labs(title = "Mean Corpuscular Volume")
dat %>%
filter(is.finite(mcv)) %>%
mutate(log10_mcv = log10(mcv)) %>% # Calculate log10(mcv)
filter(log10_mcv < 1.2 | log10_mcv > 3) %>% # Apply thresholds, based on data distribution
select(key, class, species_reported, log10_mcv, mcv) %>% # Select relevant columns
arrange(key, mcv)
## key class species_reported log10_mcv mcv
## 1 rayyan-33186305 Dipnoi Protopterus aethiopicus 3.841359 6940.00
## 2 rayyan-33189616 Chondrichthyes Scyliorhinus canicula 3.186872 1537.70
## 3 rayyan-33189628 Actinopterygii Solea senegalensis 1.158061 14.39
## 4 rayyan-33189628 Actinopterygii Solea senegalensis 1.167317 14.70
I also calculate the cell area, cell volume, nuclear area, and nuclear volume using available length and width data. This approach allows us to derive these parameters when direct measurements are unavailable, thereby enhancing the completeness of ErythroCite.
For this, I employed standard formulae to calculate the area and the volume of the cell or its nucleus, assuming that both the cell and its nucleus were shaped like ellipsoids or oblate spheroids (Benfey & Sutterlin, 1984; Gregory, 2024).
The formula for cellular area (A) is:
\[A = \pi \times \frac{a}{2} \times \frac{b}{2}\]
# Calculate cell area and volume if missing
dat$cell_area <- ifelse(is.na(dat$cell_area),
pi * (dat$cell_length/2) * (dat$cell_width/2),
dat$cell_area)
dat$cell_volume <- ifelse(is.na(dat$cell_volume),
(4/3) * pi * (dat$cell_length/2) * (dat$cell_width/2)^2,
dat$cell_volume)
The formula for cellular volume (V) is:
\[V = \frac{4}{3} \times \pi \times \frac{a}{2} \times \left(\frac{b}{2}\right)^2\]
Where ‘a’ and ‘b’ denote the lengths of the semi-major and semi-minor axes of an ellipse, respectively.
# Calculate nuclear area and volume if missing
dat$nucleus_area <- ifelse(is.na(dat$nucleus_area),
pi * (dat$nucleus_length/2) * (dat$nucleus_width/2),
dat$nucleus_area)
dat$nucleus_volume <- ifelse(is.na(dat$nucleus_volume),
(4/3) * pi * (dat$nucleus_length/2) * (dat$nucleus_width/2)^2,
dat$nucleus_volume)
I will apply transformations to the error columns (cell_length_error, cell_width_error, cell_area_error, cell_volume_error, mcv_error, nucleus_length_error, nucleus_width_error, nucleus_area_error, and nucleus_volume_error) based on the estimate_error_type column:
unique(dat$estimate_error_type)
## [1] NA "2SE" "SD" "SE" "95_CI"
For these type of errors, the transformations are based on well-known fomuelae :
SD (Standard Deviation): The value remains unchanged if the error type is “SD”.
SE (Standard Error): The error is converted to SD using the formula:
\[ \text{SD} = \text{SE} \times \sqrt{N} \]
where \(N\) is the number of specimens used for the trait mean estimate.
\[ \text{SD} = \left(\frac{\text{2SE}}{2}\right) \times \sqrt{N} = \text{SE} \times \sqrt{N} \]
where \(N\) is the number of specimens used for the trait mean estimate.
\[ \text{SD} = \left(\frac{\text{95% CI}}{1.96}\right) \times \sqrt{N} \]
where \(n\) is the number of specimens, and \(1.96\) corresponds to the Z-score for a 95% confidence interval.
# transform 'number_of_specimens' to numeric
dat$number_of_specimens <- as.numeric(as.character(dat$number_of_specimens))
# Apply transformation to all relevant error columns
dat <- dat %>%
mutate(
# Transforming cell_length_error to SD
cell_length_sd = case_when(
estimate_error_type == "SD" ~ cell_length_error, # Keep SD values unchanged
estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ cell_length_error * sqrt(number_of_specimens), # Convert SE to SD
estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_length_error / 2) * sqrt(number_of_specimens), # Convert 2SE to SD
estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_length_error / 1.96) * sqrt(number_of_specimens), # Convert 95% CI to SD
TRUE ~ NA_real_ # Assign NA if conversion is not possible
),
# Transforming cell_width_error to SD
cell_width_sd = case_when(
estimate_error_type == "SD" ~ cell_width_error, # Keep SD values unchanged
estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ cell_width_error * sqrt(number_of_specimens), # Convert SE to SD
estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_width_error / 2) * sqrt(number_of_specimens), # Convert 2SE to SD
estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_width_error / 1.96) * sqrt(number_of_specimens), # Convert 95% CI to SD
TRUE ~ NA_real_ # Assign NA if conversion is not possible
),
# Transforming cell_area_error to SD
cell_area_sd = case_when(
estimate_error_type == "SD" ~ cell_area_error, # Keep SD values unchanged
estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ cell_area_error * sqrt(number_of_specimens), # Convert SE to SD
estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_area_error / 2) * sqrt(number_of_specimens), # Convert 2SE to SD
estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_area_error / 1.96) * sqrt(number_of_specimens), # Convert 95% CI to SD
TRUE ~ NA_real_ # Assign NA if conversion is not possible
),
# Transforming cell_volume_error to SD
cell_volume_sd = case_when(
estimate_error_type == "SD" ~ cell_volume_error, # Keep SD values unchanged
estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ cell_volume_error * sqrt(number_of_specimens), # Convert SE to SD
estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_volume_error / 2) * sqrt(number_of_specimens), # Convert 2SE to SD
estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_volume_error / 1.96) * sqrt(number_of_specimens), # Convert 95% CI to SD
TRUE ~ NA_real_ # Assign NA if conversion is not possible
),
# Transforming mcv_error to SD
mcv_sd = case_when(
estimate_error_type == "SD" ~ mcv_error, # Keep SD values unchanged
estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ mcv_error * sqrt(number_of_specimens), # Convert SE to SD
estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (mcv_error / 2) * sqrt(number_of_specimens), # Convert 2SE to SD
estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (mcv_error / 1.96) * sqrt(number_of_specimens), # Convert 95% CI to SD
TRUE ~ NA_real_ # Assign NA if conversion is not possible
),
# Transforming nucleus_length_error to SD
nucleus_length_sd = case_when(
estimate_error_type == "SD" ~ nucleus_length_error, # Keep SD values unchanged
estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ nucleus_length_error * sqrt(number_of_specimens), # Convert SE to SD
estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_length_error / 2) * sqrt(number_of_specimens), # Convert 2SE to SD
estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_length_error / 1.96) * sqrt(number_of_specimens), # Convert 95% CI to SD
TRUE ~ NA_real_ # Assign NA if conversion is not possible
),
# Transforming nucleus_width_error to SD
nucleus_width_sd = case_when(
estimate_error_type == "SD" ~ nucleus_width_error, # Keep SD values unchanged
estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ nucleus_width_error * sqrt(number_of_specimens), # Convert SE to SD
estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_width_error / 2) * sqrt(number_of_specimens), # Convert 2SE to SD
estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_width_error / 1.96) * sqrt(number_of_specimens), # Convert 95% CI to SD
TRUE ~ NA_real_ # Assign NA if conversion is not possible
),
# Transforming nucleus_area_error to SD
nucleus_area_sd = case_when(
estimate_error_type == "SD" ~ nucleus_area_error, # Keep SD values unchanged
estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ nucleus_area_error * sqrt(number_of_specimens), # Convert SE to SD
estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_area_error / 2) * sqrt(number_of_specimens), # Convert 2SE to SD
estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_area_error / 1.96) * sqrt(number_of_specimens), # Convert 95% CI to SD
TRUE ~ NA_real_ # Assign NA if conversion is not possible
),
# Transforming nucleus_volume_error to SD
nucleus_volume_sd = case_when(
estimate_error_type == "SD" ~ nucleus_volume_error, # Keep SD values unchanged
estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ nucleus_volume_error * sqrt(number_of_specimens), # Convert SE to SD
estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_volume_error / 2) * sqrt(number_of_specimens), # Convert 2SE to SD
estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_volume_error / 1.96) * sqrt(number_of_specimens), # Convert 95% CI to SD
TRUE ~ NA_real_ # Assign NA if conversion is not possible
)
)
# lines below take some time to run
dat <- dat %>%
reverse_geocode(lat = lat_dec,
long = long_dec,
method = "osm") %>%
mutate(country_collection = sub(".*,\\s*", "", address)) # Extract country name
Many of the country names are in their native languages; therefore, I changed all the names to their official English versions.
dat <- dat %>%
mutate(country_collection = case_when(
country_collection == "Bermuda" ~ "United Kingdom",
country_collection == "Bosna i Hercegovina / Босна и Херцеговина" ~ "Bosnia and Herzegovina",
country_collection == "Brasil" ~ "Brazil",
country_collection == "Česko" ~ "Czechia",
country_collection == "Congo" ~ "Democratic Republic of the Congo",
country_collection == "Deutschland" ~ "Germany",
country_collection == "España" ~ "Spain",
country_collection == "Italia" ~ "Italy",
country_collection == "Mauritius / Maurice" ~ "Mauritius",
country_collection == "México" ~ "Mexico",
country_collection == "Perú" ~ "Peru",
country_collection == "Norge" ~ "Norway",
country_collection == "Polska" ~ "Poland",
country_collection == "Türkiye" ~ "Turkey",
country_collection == "Sverige" ~ "Sweden",
country_collection == "United States" ~ "United States of America",
country_collection == "Ελλάς" ~ "Greece",
country_collection == "Россия" ~ "Russia",
country_collection == "Україна" ~ "Ukraine",
country_collection == "العراق" ~ "Iraq",
country_collection == "پاکستان" ~ "Pakistan",
country_collection == "ایران" ~ "Iran",
country_collection == "سوريا" ~ "Syria",
country_collection == "مصر" ~ "Egypt",
country_collection == "عمان" ~ "Oman",
country_collection == "ประเทศไทย" ~ "Thailand",
country_collection == "বাংলাদেশ" ~ "Bangladesh",
country_collection == "대한민국" ~ "South Korea",
country_collection == "中国" ~ "China",
country_collection == "日本" ~ "Japan",
TRUE ~ country_collection
)) %>%
mutate(subcontinent = case_when(
country_collection %in% c("Argentina", "Brazil", "Chile", "Colombia", "Ecuador", "Peru", "Venezuela") ~ "South America", # South America
country_collection %in% c("United States of America", "Mexico", "Bermuda") ~ "North America", # North America
country_collection %in% c("United Kingdom", "Germany", "France", "Italy", "Spain", "Portugal", "Norway", "Sweden") ~ "Western Europe", # Western Europe
country_collection %in% c("Russia", "Ukraine", "Poland", "Czechia", "Greece", "Bosnia and Herzegovina") ~ "Eastern Europe", # Eastern Europe
country_collection %in% c("Iran", "Iraq", "Turkey", "Oman", "Syria") ~ "Middle East", # Middle East
country_collection %in% c("India", "Pakistan", "Bangladesh") ~ "South Asia", # South Asia
country_collection %in% c("China", "Japan", "South Korea") ~ "East Asia", # East Asia
country_collection %in% c("Malaysia", "Thailand") ~ "Southeast Asia", # Southeast Asia
country_collection %in% c("Egypt", "Nigeria", "Mauritius", "Niger", "Democratic Republic of the Congo") ~ "Africa", # Africa
country_collection %in% c("Australia") ~ "Oceania", # Oceania
TRUE ~ country_collection # Default for unclassified countries
))
# check again
unique(dat$country_collection)
## [1] NA "United Kingdom"
## [3] "United States of America" "India"
## [5] "China" "Brazil"
## [7] "Mexico" "Czechia"
## [9] "Venezuela" "Japan"
## [11] "Chile" "Iran"
## [13] "Turkey" "Ukraine"
## [15] "Pakistan" "Nigeria"
## [17] "Egypt" "Russia"
## [19] "Argentina" "Australia"
## [21] "Poland" "Iraq"
## [23] "Democratic Republic of the Congo" "Bosnia and Herzegovina"
## [25] "Greece" "Italy"
## [27] "Ecuador" "Mauritius"
## [29] "Thailand" "Syria"
## [31] "Malaysia" "Bangladesh"
## [33] "South Korea" "Peru"
## [35] "Niger" "Colombia"
## [37] "Sweden" "Norway"
## [39] "Germany" "Spain"
## [41] "Portugal" "Oman"
# select relevant columns
dat_cleaned <- dat %>%
select(-double_checked, -source, taxo_level, -isMarine, -isBrackish, -isFresh, -address)
# reorder columns
names(dat_cleaned)
## [1] "species_reported" "database" "key"
## [4] "body_mass_gram" "sex" "life_stage"
## [7] "lat_dec" "long_dec" "location_description"
## [10] "sample_size" "number_of_specimens" "estimate_error_type"
## [13] "cell_length" "cell_length_error" "cell_width"
## [16] "cell_width_error" "cell_area" "cell_area_error"
## [19] "cell_volume" "cell_volume_error" "mcv"
## [22] "mcv_error" "nucleus_length" "nucleus_length_error"
## [25] "nucleus_width" "nucleus_width_error" "nucleus_area"
## [28] "nucleus_area_error" "nucleus_volume" "nucleus_volume_error"
## [31] "notes" "phylum" "class"
## [34] "order" "family" "genus"
## [37] "species" "taxo_level" "realm"
## [40] "species_underscored" "cell_length_sd" "cell_width_sd"
## [43] "cell_area_sd" "cell_volume_sd" "mcv_sd"
## [46] "nucleus_length_sd" "nucleus_width_sd" "nucleus_area_sd"
## [49] "nucleus_volume_sd" "country_collection" "subcontinent"
dat_cleaned <- dat_cleaned %>%
select(database, key, phylum, class, order, family, genus, species,species_reported, species_underscored,taxo_level,
realm,
lat_dec, long_dec, location_description, country_collection, subcontinent,
everything()) %>%
mutate(across(where(is.numeric), ~round(., digits = 4)))
# Panel A: Extracting and cleaning publication years:
df_years <- refs %>%
as.data.frame() %>%
select(year) %>%
mutate(year = as.numeric(as.character(year))) %>% # Ensure year is numeric
filter(!is.na(year)) # Remove NA values
# Counting studies per year and calculating cumulative values:
studies_per_year <- df_years %>%
group_by(year) %>%
summarise(num_studies = n()) %>%
arrange(year) %>%
mutate(cumulative_studies = cumsum(num_studies)) # Calculate cumulative count
# Plotting the cumulative number of studies:
plot_years <- ggplot(studies_per_year, aes(x = year, y = cumulative_studies)) +
geom_line(color = "#00AFBB", linewidth = 2) +
scale_y_continuous(limits = c(0, 200)) + # Limit the y-axis to 200
scale_x_continuous(
limits = c(1875, 2025),
breaks = seq(1875, 2025, by = 25) # Set x-axis intervals to 25 years
) +
labs(
x = "Publication Year",
y = "Number of Studies"
) +
theme_pubr() +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12)
)
plot_years
# ------------------------------------------------------------------------------
# Panel B: Extracting and counting journals
df_journals <- refs %>%
as.data.frame() %>%
select(journal) %>%
filter(!is.na(journal)) %>% # Remove missing journal entries
group_by(journal) %>%
summarise(num_articles = n()) %>%
arrange(desc(num_articles)) %>%
slice_head(n = 15) # Select the top 15 journals
# Plotting the 15 most common journals
plot_journals <- ggplot(df_journals, aes(x = reorder(journal, num_articles), y = num_articles)) +
geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
scale_y_continuous(limits = c(0, 20)) + # Limit the y-axis to 20
coord_flip() +
theme_pubr() +
labs(
x = "Journal",
y = "Number of Studies"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 10) # Larger font size for journal names
)
plot_journals
# ------------------------------------------------------------------------------
# Combining Panel A and Panel B
Figure_2 <- plot_grid(
plot_years,
plot_journals,
labels = c("A", "B"),
nrow = 2,
ncol = 1,
label_size = 15
)
# Saving the combined figure
ggsave('../manuscript/Figure_2.pdf', Figure_2, width = 7, height = 9)
ggsave('../manuscript/Figure_2.png', Figure_2, width = 7, height = 9, dpi = 1200)
sort(unique(dat$realm))
## [1] "freshwater" "freshwater-brackish"
## [3] "freshwater-brackish-marine" "marine"
## [5] "marine-brackish"
# [1] "freshwater" "freshwater-brackish" "freshwater-brackish-marine"
# [4] "marine" "marine-brackish"
fishualize("Hypleurochilus_fissicornis")
class_realm <- c("freshwater",
"freshwater-brackish",
"freshwater-brackish-marine",
"marine",
"marine-brackish")
# plot for the SEB meeting 2025
sort(unique(dat$class))
## [1] "Actinopterygii" "Chondrichthyes" "Cyclostomata" "Dipnoi"
# [1] "Actinopterygii" "Chondrichthyes" "Cyclostomata" "Dipnoi"
class_type <- c("Actinopterygii",
"Chondrichthyes",
"Cyclostomata",
"Dipnoi")
# Lets use fishualize package to color the map (https://github.com/nschiett/fishualize)
fishualize("Scarus_quoyi")
filtered_data <- dat %>%
filter(between(long_dec, -180, 180), between(lat_dec, -90, 90))
cell_size_studies <-ggplot(filtered_data, aes(x = long_dec, y = lat_dec)) +
borders("world", colour = "black", fill = "white", size = 0.1) +
theme_map() +
geom_point(aes(fill = realm), size = 2, shape = 21, colour = "black", stroke = 0.2) +
scale_fill_fish_d(option = "Scarus_quoyi",
labels = class_realm, name = "") +
coord_quickmap(expand = FALSE) +
theme(
legend.position = c(0.03, 0.2),
legend.justification = c(0, 0),
legend.background = element_rect(fill = "transparent", color = NA), # Remove legend frame
legend.margin = margin(2, 2, 2, 2), # Reduce legend margin
legend.text = element_text(size = 8), # Reduce legend text size
legend.title = element_text(size = 10, face = "bold"),
legend.key.size = unit(0.8, "lines"), # Reduce legend symbol size
panel.background = element_rect(fill = "white"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 10),
axis.title.y = element_text(angle = 90, vjust = 0.5) # Rotate y-axis title
) +
guides(fill = guide_legend(override.aes = list(size = 2))) +
scale_x_continuous(name = "Longitude (degrees)", breaks = seq(-180, 180, 60)) +
scale_y_continuous(name = "Latitude (degrees)", breaks = seq(-90, 90, 30))
cell_size_studies
dat_summary <- dat %>%
group_by(country_collection) %>%
filter(!is.na(country_collection)) %>%
summarise(study_count = length(unique(key))) # Count occurrences of each reference per country
names(dat_summary)
## [1] "country_collection" "study_count"
# Obtain geospatial data for countries
world <- ne_download(scale = 110, type = "countries", category = "cultural", returnclass = "sf")
## Reading layer `ne_110m_admin_0_countries' from data source
## `/private/var/folders/rl/4qqy5shj5nldjmznhtd96p5m0000gp/T/RtmpjF31Ek/ne_110m_admin_0_countries.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 177 features and 168 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
# Merge the geospatial data with the study data
world_data <- world %>%
left_join(dat_summary, by = c("SOVEREIGNT" = "country_collection"))
# This plot was excluded after the first round of revisons
country_studies <- ggplot(world_data) +
geom_sf(aes(fill = study_count), color = "black") +
theme_map() +
scale_fill_gradient(low = "white",
high = "gray20",
na.value = "white",
name = "Nº Studies",
limits = c(0, max(world_data$study_count, na.rm = TRUE))) +
coord_sf(expand = FALSE) +
theme(
legend.position = c(0.03, 0.2),
legend.justification = c(0, 0),
legend.background = element_rect(fill = "transparent", color = NA),
legend.margin = margin(2, 2, 2, 2),
legend.text = element_text(size = 8),
legend.title = element_text(size = 10, face = "bold"),
legend.key.size = unit(0.8, "lines"),
panel.background = element_rect(fill = "white"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 10),
axis.title.y = element_text(angle = 90, vjust = 0.5)
) +
scale_x_continuous(name = "Longitude (degrees)",
breaks = seq(-180, 180, 60),
labels = scales::number_format(accuracy = 1)) +
scale_y_continuous(name = "Latitude (degrees)",
breaks = seq(-90, 90, 30),
labels = scales::number_format(accuracy = 1)) +
guides(fill = guide_legend(override.aes = list(size = 2)))
# Counting Studies Per Subcontinent
subconti_by_studies <- dat %>%
filter(!is.na(subcontinent)) %>%
group_by(subcontinent) %>%
summarise(num_studies = n_distinct(key)) %>%
arrange(desc(num_studies))
# Plotting Studies by subcontinent
plot_subcont <- ggplot(subconti_by_studies, aes(x = reorder(subcontinent, num_studies), y = num_studies)) +
geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
coord_flip() +
theme_pubr() +
labs(
x = "Subcontinent",
y = "Number of Studies"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 10)
)
plot_subcont
# Para la version revisadad de nuestro articulo, una de las sugeerencias del revisor fue
Figure_3 <- plot_grid(
cell_size_studies,
# country_studies,
plot_subcont,
labels = c("A", "B"),
nrow = 2,
ncol = 1,
label_size = 15
)
# Store Plots
ggsave('../manuscript/Figure_3.pdf', Figure_3, width = 7, height = 10)
ggsave('../manuscript/Figure_3.png', Figure_3, width = 7, height = 10, dpi = 2000)
The following steps describe how to visualise the data associated with each trait and species within the phylogenetic tree. To accomplish this, we will first calculate mean values per species and trait, select the columns of interest, and standardise these values to enhance the colour contrast of the scale.
# Summaries data: calculate the mean value per species for each trait
summary_data <- dat %>%
group_by(species_underscored) %>%
summarise(across(c(cell_area, cell_volume, nucleus_area, nucleus_volume, mcv),
~ mean(., na.rm = TRUE)),
.groups = "drop") %>%
rename(
"Cell area" = cell_area,
"Cell volume" = cell_volume,
"Nucleus area" = nucleus_area,
"Nucleus volume" = nucleus_volume,
"MCV" = mcv
)
# Scale min-max global by variable
summary_data <- summary_data %>%
mutate(across(c("Cell area", "Cell volume", "Nucleus area", "Nucleus volume", "MCV"),
~ (.-min(., na.rm = TRUE)) / (max(., na.rm = TRUE) - min(., na.rm = TRUE)))) # scale min-max global by variable
# Script section adapted by FPLeiva after RMolinaVenegas (Gracias Rafa)
summary_data
## # A tibble: 660 × 6
## species_underscored `Cell area` `Cell volume` `Nucleus area` `Nucleus volume`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Abalistes_stellatus 0.0298 NaN 0.0310 NaN
## 2 Abramis_brama 0.0465 0.0141 NaN NaN
## 3 Abudefduf_saxatilis 0.0415 0.0128 0.00995 NaN
## 4 Abudefduf_septemfa… NaN NaN NaN 0.00549
## 5 Abudefduf_sordidus 0.0738 0.0285 0.120 0.0561
## 6 Abudefduf_taurus 0.0460 0.0149 NaN NaN
## 7 Abudefduf_vaigiens… 0.0769 0.0309 0.119 0.0538
## 8 Acanthocybium_sola… 0.0618 0.0193 NaN NaN
## 9 Acanthogobius_hasta 0.0657 0.0230 0.110 0.0765
## 10 Acanthopagrus_aust… NaN NaN NaN 0.0123
## # ℹ 650 more rows
## # ℹ 1 more variable: MCV <dbl>
unique(summary_data$species_underscored)
## [1] "Abalistes_stellatus" "Abramis_brama"
## [3] "Abudefduf_saxatilis" "Abudefduf_septemfasciatus"
## [5] "Abudefduf_sordidus" "Abudefduf_taurus"
## [7] "Abudefduf_vaigiensis" "Acanthocybium_solandri"
## [9] "Acanthogobius_hasta" "Acanthopagrus_australis"
## [11] "Acanthopagrus_butcheri" "Acanthostracion_polygonius"
## [13] "Acanthostracion_quadricornis" "Acanthurus_bahianus"
## [15] "Acanthurus_chirurgus" "Acanthurus_coeruleus"
## [17] "Acanthurus_gahhm" "Acanthurus_grammoptilus"
## [19] "Acipenser_brevirostrum" "Acipenser_oxyrinchus"
## [21] "Acipenser_sinensis" "Acipenser_sturio"
## [23] "Aeoliscus_strigatus" "Aetobatus_narinari"
## [25] "Albula_vulpes" "Aldrichetta_forsteri"
## [27] "Alosa_fallax" "Alphestes_afer"
## [29] "Aluterus_schoepfii" "Aluterus_scriptus"
## [31] "Ameiurus_catus" "Ammodytes_tobianus"
## [33] "Amphiprion_akindynos" "Amphiprion_clarkii"
## [35] "Anabas_testudineus" "Anguilla_bicolor"
## [37] "Anguilla_japonica" "Anguilla_marmorata"
## [39] "Anguilla_rostrata" "Aplodactylus_arctidens"
## [41] "Apogon_maculatus" "Aprion_virescens"
## [43] "Aptychotrema_rostrata" "Archosargus_rhomboidalis"
## [45] "Arothron_manilensis" "Arripis_trutta"
## [47] "Astyanax_lineatus" "Astyanax_mexicanus"
## [49] "Atheresthes_evermanni" "Atractosteus_tristoechus"
## [51] "Aulacocephalus_temminckii" "Auxis_thazard"
## [53] "Bairdiella_ronchus" "Balistapus_undulatus"
## [55] "Balistes_capriscus" "Balistes_carolinensis"
## [57] "Balistes_vetula" "Barbatula_barbatula"
## [59] "Barbatula_toni" "Basilichthys_australis"
## [61] "Bathygobius_soporator" "Bathyraja_parmifera"
## [63] "Bathytoshia_centroura" "Belone_belone"
## [65] "Bentartia_pusillum" "Betta_splendens"
## [67] "Boops_boops" "Boreogadus_saida"
## [69] "Bothus_lunatus" "Bovichtus_angustifrons"
## [71] "Brachygenys_chrysargyrea" "Brevoortia_tyrannus"
## [73] "Brycon_hilarii" "Bujurquina_vittata"
## [75] "Caelorinchus_innotabilis" "Caesio_cuning"
## [77] "Calamus_bajonado" "Calamus_calamus"
## [79] "Calamus_penna" "Calamus_pennatula"
## [81] "Callionymus_lyra" "Cantherhines_pullus"
## [83] "Canthigaster_bennetti" "Canthigaster_valentini"
## [85] "Carangoides_bartholomaei" "Carangoides_ruber"
## [87] "Caranx_carangus" "Caranx_hippos"
## [89] "Caranx_ignobilis" "Caranx_latus"
## [91] "Caranx_lugubris" "Caranx_sexfasciatus"
## [93] "Carassius_auratus" "Carassius_carassius"
## [95] "Carassius_gibelio" "Carcharhinus_brachyurus"
## [97] "Carcharhinus_falciformis" "Carcharhinus_leucas"
## [99] "Carcharhinus_maculipinnis" "Carcharhinus_melanopterus"
## [101] "Carcharhinus_milberti" "Carcharhinus_obscurus"
## [103] "Carcharhinus_plumbeus" "Catostomus_catostomus"
## [105] "Catostomus_commersonii" "Centriscops_humerosus"
## [107] "Centropomus_undecimalis" "Centropyge_bicolor"
## [109] "Centroscymnus_coelolepis" "Centroscymnus_crepidater"
## [111] "Centroscymnus_owstoni" "Cephalopholis_cruentata"
## [113] "Cephalopholis_fulva" "Cephalopholis_miniata"
## [115] "Chaetodipterus_faber" "Chaetodon_capistratus"
## [117] "Chaetodon_lunulatus" "Chaetodon_ocellatus"
## [119] "Chaetodon_rainfordi" "Chaetodon_sedentarius"
## [121] "Chaetodon_striatus" "Channa_argus"
## [123] "Channa_punctata" "Channa_striata"
## [125] "Cheilinus_trilobatus" "Chelidonichthys_cuculus"
## [127] "Chelidonichthys_lucerna" "Chelon_ramada"
## [129] "Chilomycterus_spinosus" "Chiloscyllium_punctatum"
## [131] "Chirocentrus_dorab" "Chloroscombrus_chrysurus"
## [133] "Choerodon_albigena" "Choerodon_cephalotes"
## [135] "Choerodon_fasciatus" "Chromis_analis"
## [137] "Chromis_viridis" "Chrosomus_neogaeus"
## [139] "Chrysiptera_cyanea" "Cichlasoma_dimerus"
## [141] "Ciliata_mustela" "Cirrhinus_mrigala"
## [143] "Cirrhinus_reba" "Clarias_batrachus"
## [145] "Clarias_gariepinus" "Clupea_harengus"
## [147] "Clupeonella_cultriventris" "Cobitis_biwae"
## [149] "Cobitis_striata" "Cobitis_taenia"
## [151] "Cobitis_takatsuensis" "Coelorinchus_maurofasciatus"
## [153] "Colossoma_macropomum" "Conger_conger"
## [155] "Contusus_brevicaudus" "Coregonus_clupeaformis"
## [157] "Coregonus_maraena" "Coreius_guichenoti"
## [159] "Coris_batuensis" "Coryphaena_hippurus"
## [161] "Coryphaenoides_serrulatus" "Corythoichthys_intestinalis"
## [163] "Cottus_gobio" "Crossosalarias_macrospilus"
## [165] "Cryptacanthodes_maculatus" "Cryptocentrus_leptocephalus"
## [167] "Ctenopharyngodon_idella" "Cyclopteropsis_jordani"
## [169] "Cyclopterus_lumpus" "Cyprinus_carpio"
## [171] "Dactylopterus_volitans" "Danio_rerio"
## [173] "Dascyllus_aruanus" "Datnioides_polota"
## [175] "Delminichthys_ghetaldii" "Diagramma_labiosum"
## [177] "Diagramma_picta" "Diapterus_rhombeus"
## [179] "Diastobranchus_capensis" "Dicentrarchus_labrax"
## [181] "Diodon_holocanthus" "Diplodus_argenteus"
## [183] "Diplodus_vulgaris" "Dipturus_batis"
## [185] "Dipturus_chilensis" "Dipturus_laevis"
## [187] "Diretmichthys_parini" "Dischistodus_prosopotaenia"
## [189] "Dissostichus_mawsoni" "Dormitator_latifrons"
## [191] "Drepane_punctata" "Echeneis_naucrates"
## [193] "Ecsenius_mandibularis" "Ecsenius_yaeyamaensis"
## [195] "Electrophorus_electricus" "Ellochelon_vaigiensis"
## [197] "Elopichthys_bambusa" "Elops_saurus"
## [199] "Engraulis_anchoita" "Engraulis_encrasicolus"
## [201] "Epalzeorhynchos_bicolor" "Epalzeorhynchos_frenatum"
## [203] "Epinephelus_adscensionis" "Epinephelus_cyanopodus"
## [205] "Epinephelus_fasciatus" "Epinephelus_guttatus"
## [207] "Epinephelus_merra" "Epinephelus_ongus"
## [209] "Epinephelus_quoyans" "Epinephelus_spilotoceps"
## [211] "Epinephelus_striatus" "Equetus_pulcher"
## [213] "Esox_lucius" "Esox_niger"
## [215] "Etmopterus_brachyurus" "Etmopterus_granulosus"
## [217] "Eucinostomus_argenteus" "Eucinostomus_gula"
## [219] "Eugerres_plumieri" "Eumicrotremus_spinosus"
## [221] "Eupomacentrus_fuscus" "Eupomacentrus_leucostictus"
## [223] "Eupomacentrus_variabilis" "Euristhmus_lepturus"
## [225] "Euthynnus_alletteratus" "Eutrigla_gurnardus"
## [227] "Farlowella_acus" "Fistularia_petimba"
## [229] "Fluvitrygon_signifer" "Gadus_morhua"
## [231] "Gaidropsarus_ensis" "Gaidropsarus_mediterraneus"
## [233] "Galaxias_maculatus" "Galaxias_olidus"
## [235] "Galeocerdo_cuvier" "Gambusia_holbrooki"
## [237] "Gasterosteus_aculeatus" "Geotria_australis"
## [239] "Gerres_cinereus" "Gerres_filamentosus"
## [241] "Gerres_subfasciatus" "Ginglymostoma_cirratum"
## [243] "Girella_elevata" "Girella_zebra"
## [245] "Glyptocephalus_cynoglossus" "Glyptosternon_maculatum"
## [247] "Gnathanodon_speciosus" "Gobio_gobio"
## [249] "Gobiocypris_rarus" "Gobiodon_citrinus"
## [251] "Gobius_cobitis" "Gymnelus_viridis"
## [253] "Gymnocephalus_cernua" "Gymnocranius_audleyi"
## [255] "Gymnocypris_eckloni" "Gymnothorax_funebris"
## [257] "Gymnothorax_pictus" "Gymnothorax_vicinus"
## [259] "Gymnotus_inaequilabiatus" "Gyrinocheilus_aymonieri"
## [261] "Haemulon_aurolineatum" "Haemulon_flavolineatum"
## [263] "Haemulon_plumierii" "Haemulon_sciurus"
## [265] "Halargyreus_johnsonii" "Halichoeres_biocellatus"
## [267] "Halichoeres_bivittatus" "Halichoeres_garnoti"
## [269] "Halichoeres_radiatus" "Harengula_humeralis"
## [271] "Helicolenus_barathri" "Helicolenus_percoides"
## [273] "Hemiglyphidodon_plagiometopon" "Hemiramphus_brasiliensis"
## [275] "Hemiscyllium_ocellatum" "Hemitripterus_americanus"
## [277] "Hemitrygon_bennettii" "Heterodontus_francisci"
## [279] "Heteropneustes_fossilis" "Heterotis_niloticus"
## [281] "Hippocampus_abdominalis" "Hippoglossus_hippoglossus"
## [283] "Hirundichthys_affinis" "Holacanthus_bermudensis"
## [285] "Holacanthus_ciliaris" "Holacanthus_tricolor"
## [287] "Holocentrus_ascensionis" "Holocentrus_rufus"
## [289] "Hoplias_malabaricus" "Hoplisoma_metae"
## [291] "Hoplisoma_paleatus" "Hucho_hucho"
## [293] "Huso_huso" "Hypanus_americanus"
## [295] "Hypophthalmichthys_molitrix" "Hypophthalmichthys_nobilis"
## [297] "Hypoplectrus_unicolor" "Hyporhamphus_melanochir"
## [299] "Hypostomus_boulengeri" "Hypostomus_plecostomus"
## [301] "Icelus_spatula" "Ictalurus_punctatus"
## [303] "Idiacanthus_atlanticus" "Iranocichla_hormuzensis"
## [305] "Istigobius_rigilius" "Isurus_oxyrinchus"
## [307] "Jenynsia_lineata" "Kathetostoma_canaster"
## [309] "Katsuwonus_pelamis" "Konosirus_punctatus"
## [311] "Labeo_catla" "Labeo_chrysophekadion"
## [313] "Labeo_rohita" "Labrisomus_nuchipinnis"
## [315] "Lachnolaimus_maximus" "Lactophrys_trigonus"
## [317] "Lagocephalus_lunaris" "Lamna_nasus"
## [319] "Lampetra_fluviatilis" "Lampetra_planeri"
## [321] "Lampris_regius" "Lates_calcarifer"
## [323] "Lefua_echigonia" "Lefua_nikkonis"
## [325] "Leiopotherapon_unicolor" "Lepidopsetta_bilineata"
## [327] "Lepomis_macrochirus" "Lethrinus_atkinsoni"
## [329] "Lethrinus_miniatus" "Lethrinus_nebulosus"
## [331] "Lethrinus_rubrioperculatus" "Leucaspius_delineatus"
## [333] "Leuciscus_idus" "Leucoraja_erinaceus"
## [335] "Leucoraja_ocellata" "Limanda_aspera"
## [337] "Limanda_limanda" "Liparis_tunicatus"
## [339] "Lipophrys_pholis" "Lophius_americanus"
## [341] "Lophius_piscatorius" "Lutjanus_adetii"
## [343] "Lutjanus_analis" "Lutjanus_apodus"
## [345] "Lutjanus_carponotatus" "Lutjanus_cyanopterus"
## [347] "Lutjanus_fulviflamma" "Lutjanus_griseus"
## [349] "Lutjanus_lutjanus" "Lutjanus_russellii"
## [351] "Lutjanus_sebae" "Lutjanus_synagris"
## [353] "Lutjanus_vitta" "Lutjanus_vivanus"
## [355] "Lycodichthys_dearborni" "Macropodus_opercularis"
## [357] "Macrourus_berglax" "Makaira_nigricans"
## [359] "Megaleporinus_macrocephalus" "Megalobrama_amblycephala"
## [361] "Megalops_cyprinoides" "Melanogrammus_aeglefinus"
## [363] "Merlangius_merlangus" "Merluccius_bilinearis"
## [365] "Merluccius_hubbsi" "Merluccius_merluccius"
## [367] "Mesogobius_batrachocephalus" "Mesovagus_antipodum"
## [369] "Metynnis_hypsauchen" "Metynnis_maculatus"
## [371] "Micropogonias_furnieri" "Micropterus_coosae"
## [373] "Micropterus_salmoides" "Microspathodon_chrysurus"
## [375] "Misgurnus_anguillicaudatus" "Mola_mola"
## [377] "Monopterus_albus" "Morone_americana"
## [379] "Morone_saxatilis" "Mugil_cephalus"
## [381] "Mugil_curema" "Mugil_liza"
## [383] "Mulloidichthys_martinicus" "Mulloidichthys_vanicolensis"
## [385] "Mullus_barbatus" "Mullus_surmuletus"
## [387] "Muraenesox_cinereus" "Mustelus_canis"
## [389] "Myoxocephalus_octodecemspinosus" "Myoxocephalus_quadricornis"
## [391] "Myoxocephalus_scorpius" "Myripristis_jacobus"
## [393] "Mystus_vittatus" "Myxine_glutinosa"
## [395] "Myxocyprinus_asiaticus" "Myxus_elongatus"
## [397] "Myzopsetta_ferruginea" "Nebrius_ferrugineus"
## [399] "Nectamia_savayensis" "Negaprion_brevirostris"
## [401] "Nematalosa_come" "Neoceratodus_forsteri"
## [403] "Neocyttus_rhomboidalis" "Neogobius_melanostomus"
## [405] "Neoscopelus_macrolepidotus" "Neotrygon_kuhlii"
## [407] "Niwaella_delicata" "Notolabrus_tetricus"
## [409] "Notopterus_notopterus" "Novaculichthys_taeniourus"
## [411] "Nuchequula_decora" "Ocyurus_chrysurus"
## [413] "Odontesthes_argentinensis" "Ogcocephalus_vespertilio"
## [415] "Oligoplites_saurus" "Oncorhynchus_keta"
## [417] "Oncorhynchus_kisutch" "Oncorhynchus_mykiss"
## [419] "Oncorhynchus_tshawytscha" "Ophichthus_cephalozona"
## [421] "Oplopomus_oplopomus" "Opsanus_tau"
## [423] "Orectolobus_ornatus" "Oreochromis_mossambicus"
## [425] "Oreochromis_niloticus" "Osmerus_eperlanus"
## [427] "Osmerus_mordax" "Ostorhinchus_cookii"
## [429] "Ostorhinchus_endekataenia" "Ostorhinchus_guamensis"
## [431] "Oxynotus_bruniensis" "Oxynotus_centrina"
## [433] "Pachypanchax_playfairii" "Pagellus_bogaraveo"
## [435] "Pagellus_erythrinus" "Pagrus_auratus"
## [437] "Pangasianodon_hypophthalmus" "Parachanna_obscura"
## [439] "Paragobiodon_xanthosoma" "Paralichthys_lethostigma"
## [441] "Paralichthys_olivaceus" "Paramugil_georgii"
## [443] "Parapercis_cylindrica" "Parapercis_hexophtalma"
## [445] "Parupeneus_forsskali" "Pelates_quadrilineatus"
## [447] "Pempheris_schomburgkii" "Perca_flavescens"
## [449] "Perca_fluviatilis" "Petromyzon_marinus"
## [451] "Petroscirtes_fallax" "Petroscirtes_lupus"
## [453] "Petroscirtes_mitratus" "Phoxinus_phoxinus"
## [455] "Piaractus_brachypomus" "Piaractus_mesopotamicus"
## [457] "Pimelodella_gracilis" "Plagioscion_squamosissimus"
## [459] "Plagiotremus_rhinorhynchos" "Planiliza_macrolepis"
## [461] "Platichthys_flesus" "Platybelone_argala"
## [463] "Platycephalus_bassensis" "Platycephalus_indicus"
## [465] "Plectropomus_leopardus" "Pleuronectes_platessa"
## [467] "Podothecus_accipenserinus" "Poecilia_mexicana"
## [469] "Poecilia_reticulata" "Pollachius_virens"
## [471] "Polypterus_palmas" "Polypterus_senegalus"
## [473] "Pomacanthus_arcuatus" "Pomacanthus_paru"
## [475] "Pomacentrus_nagasakiensis" "Pomadasys_kaakan"
## [477] "Porichthys_porosissimus" "Premnas_biaculeatus"
## [479] "Priacanthus_tayenus" "Prionace_glauca"
## [481] "Prionotus_carolinus" "Prionotus_evolans"
## [483] "Pristiapogon_kallopterus" "Prochilodus_lineatus"
## [485] "Prognathodes_aculeatus" "Proscymnodon_plunketi"
## [487] "Prosopium_cylindraceum" "Protopterus_aethiopicus"
## [489] "Protopterus_annectens" "Psalidodon_anisitsi"
## [491] "Psettodes_erumei" "Pseudaphritis_urvillii"
## [493] "Pseudocaranx_dentex" "Pseudomonacanthus_peroni"
## [495] "Pseudoplatystoma_corruscans" "Pseudopleuronectes_americanus"
## [497] "Pseudorhombus_jenynsii" "Pseudupeneus_maculatus"
## [499] "Pterois_volitans" "Pterophyllum_scalare"
## [501] "Pterygoplichthys_pardalis" "Pungitius_pungitius"
## [503] "Pygocentrus_nattereri" "Rachycentron_canadum"
## [505] "Raja_clavata" "Raja_montagui"
## [507] "Rastrelliger_kanagurta" "Repomucenus_limiceps"
## [509] "Rhamdia_quelen" "Rhamphichthys_rostratus"
## [511] "Rhinesomus_triqueter" "Rhizoprionodon_terraenovae"
## [513] "Rhynchocypris_lagowskii" "Rita_rita"
## [515] "Rostroraja_eglanteria" "Rutilus_kutum"
## [517] "Rutilus_rutilus" "Rypticus_saponaceus"
## [519] "Salminus_affinis" "Salmo_caspius"
## [521] "Salmo_salar" "Salmo_trutta"
## [523] "Salvelinus_alpinus" "Salvelinus_fontinalis"
## [525] "Salvelinus_namaycush" "Salvelinus_umbla"
## [527] "Sander_vitreus" "Sarda_australis"
## [529] "Sardina_pilchardus" "Sardinella_gibbosa"
## [531] "Sarpa_salpa" "Scardinius_erythrophthalmus"
## [533] "Scarus_coeruleus" "Scarus_croicensis"
## [535] "Scarus_ghobban" "Scarus_guacamaia"
## [537] "Scarus_schlegeli" "Scarus_taeniopterus"
## [539] "Schizopyge_niger" "Schizothorax_plagiostomus"
## [541] "Schizothorax_prenanti" "Scleropages_jardinii"
## [543] "Scolopsis_monogramma" "Scolopsis_vosmeri"
## [545] "Scomber_scombrus" "Scomberomorus_regalis"
## [547] "Scophthalmus_maximus" "Scophthalmus_rhombus"
## [549] "Scorpaena_cardinalis" "Scorpaena_plumieri"
## [551] "Scorpaena_porcus" "Scorpaenopsis_oxycephalus"
## [553] "Scorpis_aequipinnis" "Scyliorhinus_canicula"
## [555] "Scyliorhinus_stellaris" "Sebastes_alutus"
## [557] "Sebastes_marinus" "Sebastes_ocutalus"
## [559] "Sebastes_polyspinis" "Sebastes_schlegelii"
## [561] "Selene_vomer" "Selenotoca_multifasciata"
## [563] "Semotilus_corporalis" "Seriola_hippos"
## [565] "Seriola_lalandi" "Seriola_quinqueradiata"
## [567] "Serrasalmus_eigenmanni" "Siganus_doliatus"
## [569] "Siganus_fuscescens" "Siganus_lineatus"
## [571] "Siganus_spinus" "Siganus_sutor"
## [573] "Signigobius_biocellatus" "Sillaginodes_punctatus"
## [575] "Sillago_analis" "Silurus_asotus"
## [577] "Siniperca_chuatsi" "Solea_senegalensis"
## [579] "Solea_solea" "Soleichthys_heterorhinos"
## [581] "Sorubim_cuspicaudus" "Sorubim_lima"
## [583] "Sparisoma_aurofrenatum" "Sparisoma_chrysopterum"
## [585] "Sparisoma_radians" "Sparisoma_viride"
## [587] "Sparus_aurata" "Sphoeroides_greeleyi"
## [589] "Sphoeroides_maculatus" "Sphoeroides_spengleri"
## [591] "Sphoeroides_testudineus" "Sphyraena_barracuda"
## [593] "Sphyraena_obtusata" "Sphyrna_lewini"
## [595] "Sphyrna_mokarran" "Sphyrna_tiburo"
## [597] "Sphyrna_tudes" "Sphyrna_zygaena"
## [599] "Spicara_maena" "Sprattus_sprattus"
## [601] "Squalius_cephalus" "Squalus_acanthias"
## [603] "Squatina_australis" "Squatina_squatina"
## [605] "Stenotomus_chrysops" "Stephanolepis_hispida"
## [607] "Sufflamen_fraenatus" "Symphodus_tinca"
## [609] "Synbranchus_marmoratus" "Syngnathus_fuscus"
## [611] "Syngnathus_scovelli" "Syngnathus_typhle"
## [613] "Synodontis_notatus" "Synodus_intermedius"
## [615] "Synodus_sageneus" "Tachysurus_fulvidraco"
## [617] "Taurulus_bubalis" "Tautoga_onitis"
## [619] "Terapon_jarbua" "Terapon_puta"
## [621] "Tetrabrachium_ocellatum" "Tetraodon_nigroviridis"
## [623] "Tetronarce_nobiliana" "Thalassoma_bifasciatum"
## [625] "Thalassoma_klunzingeri" "Thalassoma_lucasanum"
## [627] "Thalassoma_lunare" "Thunnus_alalunga"
## [629] "Thunnus_albacares" "Thunnus_atlanticus"
## [631] "Thymallus_arcticus" "Thymallus_thymallus"
## [633] "Tinca_tinca" "Torpedo_torpedo"
## [635] "Trachinocephalus" "Trachinotus_botla"
## [637] "Trachinotus_coppingeri" "Trachinotus_falcatus"
## [639] "Trachinus_draco" "Trachurus_trachurus"
## [641] "Trachystoma_petardi" "Trematomus_bernacchii"
## [643] "Tripodichthys_angustifrons" "Trisopterus_luscus"
## [645] "Turrum_fulvoguttatum" "Turrum_gymnostethus"
## [647] "Tylosurus_gavialoides" "Ucla_xenogrammus"
## [649] "Ulua_aurochs" "Umbrina_coroides"
## [651] "Upeneichthys_lineatus" "Uranoscopus_scaber"
## [653] "Urophycis_tenuis" "Valenciennea_longipinnis"
## [655] "Xiphias_gladius" "Xiphophorus_hellerii"
## [657] "Zebrasoma_scopas" "Zenopsis_nebulosus"
## [659] "Zeus_faber" "Zoarces_americanus"
# Identify species to drop
species_to_drop <- setdiff(tree$tip.label, summary_data$species_underscored)
# Prune the tree
tree <- drop.tip(tree, species_to_drop)
# Align data and tree
datF <- summary_data %>%
column_to_rownames("species_underscored")
dat_tree <- datF %>%
filter(row.names(.) %in% tree$tip.label)
# Add missing species
missing_species <- setdiff(tree$tip.label, row.names(dat_tree))
dat_tree_NA <- data.frame(matrix(NA, nrow = length(missing_species), ncol = ncol(datF)))
row.names(dat_tree_NA) <- missing_species
colnames(dat_tree_NA) <- colnames(datF)
Data <- rbind(dat_tree, dat_tree_NA)
Data <- Data[match(tree$tip.label, row.names(Data)), ]
# Plot with species without names
circ <- ggtree(tree, layout = "fan", open.angle = 15, branch.length = "none")
circ <- rotate_tree(circ, 90)
circ
# Create a new plot with heatmap for each trait using a single scale
tree_data <- gheatmap(
circ,
Data,
width = 0.2,
offset = 0, # Offset for placing the heatmap
colnames_offset_x = 0,
colnames_offset_y = 0,
font.size = 0,
hjust = 0
)
tree_data
# Apply the same scale for all traits
tree_data <- tree_data +
scale_fill_viridis_c(option = "H", name = "Normalised Cell Traits", na.value = "white") +
theme(
legend.position = c(0.58, 0.5),
legend.title.position = "top",
legend.title.align = 0.5,
legend.direction = "horizontal",
legend.background = element_rect(fill = "transparent", color = NA),
legend.key = element_rect(fill = "transparent", color = NA),
legend.box.background = element_rect(fill = "transparent", color = NA),
panel.background = element_rect(fill = "transparent", color = NA), # Panel transparente
plot.background = element_rect(fill = "transparent", color = NA), # Plot transparente
panel.grid.major = element_blank(), # Sin grid
panel.grid.minor = element_blank()
)
tree_data
ggsave("../manuscript/Figure_4.png", tree_data, width = 12, height = 12,)
ggsave("../manuscript/Figure_4.pdf", tree_data, width = 12, height = 12)
# Plot with species names
circ_names <- ggtree(tree, layout = "fan", open.angle = 15) +
geom_tiplab(offset = 0.22, hjust = 0, size = 0.8)
circ_names <- rotate_tree(circ_names, 90)
# Create a new plot with heatmap for each trait using a single scale
tree_data_names <- gheatmap(
circ_names,
Data,
width = 0.2,
offset = 0, # Offset for placing the heatmap
colnames_offset_x = 0,
colnames_offset_y = 0,
font.size = 4,
hjust = 0
)
tree_data_names
# Apply the same scale for all traits
tree_data_names <- tree_data_names +
scale_fill_viridis_c(option = "H", name = "Normalised Cell Traits", na.value = "grey90") +
theme(
legend.position = c(0.6, 0.5),
legend.title.position = "top",
legend.title.align = 0.5,
legend.direction = "horizontal",
legend.key = element_blank(),
legend.background=element_blank(),
legend.key.width = unit(1, "cm"),
legend.key.height = unit(0.7, "cm")
)
tree_data_names
# Counting Studies Per Species
species_by_studies <- dat %>%
group_by(species) %>%
summarise(num_studies = n_distinct(key)) %>%
arrange(desc(num_studies)) %>%
slice_head(n = 15) # Limit to top 15 species based on the number of studies
plot_studies <- ggplot(species_by_studies, aes(x = reorder(species, num_studies), y = num_studies)) +
geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
coord_flip() +
theme_pubr() +
labs(
x = "Species",
y = "Number of Studies"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 10, face = "italic") # Italicise species names
)
plot_studies
# ------------------------------------------------------------------------------
# Counting Records per Species
species_by_records <- dat %>%
group_by(species) %>%
summarise(num_records = n()) %>%
arrange(desc(num_records)) %>%
slice_head(n = 15) # Limit to top 15 species based on the number of records
plot_records <- ggplot(species_by_records, aes(x = reorder(species, num_records), y = num_records)) +
geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
coord_flip() +
theme_pubr() +
labs(
x = "Species",
y = "Number of Records"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 10, face = "italic") # Italicise species names
)
plot_records
# ------------------------------------------------------------------------------
# Combine Plots
Figure_5 <- plot_grid(
plot_studies,
plot_records,
labels = c("A", "B"),
nrow = 2,
ncol = 1,
label_size = 15,
align = "hv"
)
# Store Plots
ggsave('../manuscript/Figure_5.pdf', Figure_5, width = 7, height = 9)
ggsave('../manuscript/Figure_5.png', Figure_5, width = 7, height = 9, dpi = 1200)
#
n_spp <- dat %>%
distinct(species) %>%
nrow()
n_spp
## [1] 660
# cousnting the most m¡common species
dat %>%
group_by(species) %>%
summarise(num_studies = n_distinct(key)) %>%
arrange(desc(num_studies)) %>%
slice_head(n = 15) # Limit to top 15 species based on the number of studies
## # A tibble: 15 × 2
## species num_studies
## <chr> <int>
## 1 Oncorhynchus mykiss 17
## 2 Cyprinus carpio 12
## 3 Labeo rohita 10
## 4 Oreochromis niloticus 8
## 5 Ctenopharyngodon idella 7
## 6 Carassius auratus 6
## 7 Channa punctata 6
## 8 Clarias gariepinus 6
## 9 Salmo trutta 6
## 10 Clarias batrachus 5
## 11 Lophius piscatorius 5
## 12 Salmo salar 5
## 13 Tinca tinca 5
## 14 Dicentrarchus labrax 4
## 15 Echeneis naucrates 4
# ------------------------------------------------------------------------------
# Counting records by species
dat %>%
group_by(species) %>%
summarise(num_records = n()) %>%
arrange(desc(num_records)) %>%
slice_head(n = 15) # Limit to top 15 species based on the number of records
## # A tibble: 15 × 2
## species num_records
## <chr> <int>
## 1 Ctenopharyngodon idella 82
## 2 Oreochromis niloticus 71
## 3 Hypophthalmichthys molitrix 63
## 4 Labeo rohita 29
## 5 Oncorhynchus mykiss 24
## 6 Cyprinus carpio 18
## 7 Dicentrarchus labrax 17
## 8 Channa punctata 15
## 9 Salmo salar 13
## 10 Clarias batrachus 12
## 11 Abudefduf saxatilis 11
## 12 Haemulon aurolineatum 11
## 13 Haemulon flavolineatum 11
## 14 Holocentrus ascensionis 11
## 15 Lutjanus griseus 11
# ------------------------------------------------------------------------------
# Number of species and percentage by Order
spp_order <- dat %>%
group_by(class, order) %>%
reframe(count_species_by_order = length(unique(species)),
percent_species_by_order = round(count_species_by_order/n_spp * 100, 2)) %>%
ungroup() %>%
arrange(desc(count_species_by_order)) %>%
slice_head(n = 15) # Limit to top 15 species based on the number of studies
spp_order
## # A tibble: 15 × 4
## class order count_species_by_order percent_species_by_o…¹
## <chr> <chr> <int> <dbl>
## 1 Actinopterygii Perciformes 115 17.4
## 2 Actinopterygii Cypriniformes 51 7.73
## 3 Actinopterygii Tetraodontiform… 29 4.39
## 4 Actinopterygii Labriformes 27 4.09
## 5 Actinopterygii Carangiformes 24 3.64
## 6 Actinopterygii Siluriformes 24 3.64
## 7 Actinopterygii Spariformes 23 3.48
## 8 Actinopterygii Pleuronectiform… 20 3.03
## 9 Chondrichthyes Carcharhiniform… 20 3.03
## 10 Actinopterygii Gadiformes 19 2.88
## 11 Actinopterygii Salmoniformes 17 2.58
## 12 Actinopterygii Syngnathiformes 17 2.58
## 13 Actinopterygii Characiformes 15 2.27
## 14 Actinopterygii Lutjaniformes 15 2.27
## 15 Actinopterygii Clupeiformes 13 1.97
## # ℹ abbreviated name: ¹percent_species_by_order
# ------------------------------------------------------------------------------
plot_orders_numb <- ggplot(spp_order, aes(x = reorder(order, count_species_by_order), y = count_species_by_order)) +
geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
coord_flip() +
theme_pubr() +
labs(
x = "Order",
y = "Number of Species"
) +
theme(
axis.title.x = element_text(face = "bold", size = 14),
axis.title.y = element_text(face = "bold", size = 14))
plot_orders_numb
# ------------------------------------------------------------------------------
plot_orders_perc <- ggplot(spp_order, aes(x = reorder(order, percent_species_by_order), y = percent_species_by_order)) +
geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
coord_flip() +
theme_pubr() +
labs(
x = "Order",
y = "Percentage of representation in ErythroCite"
) +
theme(
axis.title.x = element_text(face = "bold", size = 14),
axis.title.y = element_text(face = "bold", size = 14))
plot_orders_perc
# exporrt figure
Figure_6 <- plot_grid(
plot_orders_numb,
plot_orders_perc,
labels = c("A", "B"),
nrow = 2,
ncol = 1,
label_size = 15
)
# Store Plots
ggsave('../manuscript/Figure_6.pdf', Figure_6, width = 7, height = 9)
ggsave('../manuscript/Figure_6.png', Figure_6, width = 7, height = 9, dpi = 1200)
# Counting Studies Per Life Stage
life_stage_by_studies <- dat %>%
mutate(life_stage = ifelse(is.na(life_stage), "not reported", life_stage)) %>%
group_by(life_stage) %>%
summarise(num_studies = n_distinct(key)) %>%
arrange(desc(num_studies))
# Plotting Studies by Life Stage
plot_life_stages <- ggplot(life_stage_by_studies, aes(x = reorder(life_stage, num_studies), y = num_studies)) +
geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
coord_flip() +
theme_pubr() +
labs(
x = "Life Stage",
y = "Number of Studies"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 10) # Adjust font size for life stage names
)
plot_life_stages
# ------------------------------------------------------------------------------
# Counting Studies Per Sex
sex_by_studies <- dat %>%
mutate(sex = ifelse(is.na(sex), "not reported", sex)) %>%
group_by(sex) %>%
summarise(num_studies = n_distinct(key)) %>%
arrange(desc(num_studies))
# Plotting Studies by Sex
plot_sex <- ggplot(sex_by_studies, aes(x = reorder(sex, num_studies), y = num_studies)) +
geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
coord_flip() +
theme_pubr() +
labs(
x = "Sex",
y = "Number of Studies"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 10) # Adjust font size for sex labels
)
plot_sex
# ------------------------------------------------------------------------------
# Counting Studies Per Realm
realm_by_studies <- dat %>%
group_by(realm) %>%
summarise(num_studies = n_distinct(key)) %>%
arrange(desc(num_studies))
# Plotting Studies by Realm
plot_realm <- ggplot(realm_by_studies, aes(x = reorder(realm, num_studies), y = num_studies)) +
geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
coord_flip() +
theme_pubr() +
labs(
x = "Realms",
y = "Number of Studies"
) +
theme(
axis.title.x = element_text(face = "bold", size = 12),
axis.title.y = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 10) # Adjust font size for realm labels
)
plot_realm
# ------------------------------------------------------------------------------
# Combine Plots
Figure_7 <- plot_grid(
plot_sex,
plot_life_stages,
plot_realm,
labels = c("A", "B", "C"),
nrow = 3,
ncol = 1,
label_size = 15,
align = "hv"
)
# Store Plots
ggsave('../manuscript/Figure_7.pdf', Figure_7, width = 5, height = 8)
ggsave('../manuscript/Figure_7.png', Figure_7, width = 5, height = 8, dpi = 1200)
# Define colors for each class
cols_class <- c("Actinopterygii" = "#ef5675",
"Chondrichthyes" = "#7a5195",
"Cyclostomata" = "#075983",
"Dipnoi" = "#ffa600")
# ------------------------------------------------------------------------------
# Clean the data for cell area
dat_clean_cell_area <- dat %>%
filter(!is.na(cell_area))
# Calculate summary statistics of cell area for each class
df_summary <- dat_clean_cell_area %>%
group_by(class) %>%
summarise(
n_species = n_distinct(species),
n_obs = n(),
max_y_cell_area = max(cell_area, na.rm = TRUE))
# Plot for Cell Area
plot_cell_area <- ggplot(dat_clean_cell_area, aes(x = class, y = cell_area, fill = class)) +
geom_boxplot(width = 0.6,
fill = "white",
outlier.shape = NA) +
geom_text(data = df_summary,
aes(y = max_y_cell_area, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")),
vjust = -0.5, size = 2) +
theme_pubr() +
theme(
legend.position = "none",
axis.text.x = element_text(size = 6),
axis.text.y = element_text(size = 6),
axis.title.x = element_blank(),
axis.title.y = element_text(size = 10),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
) +
labs(y = expression("Cell area (" * mu * m^2 * ")")) +
scale_y_log10(
breaks = c(10, 20, 50, 150, 350, 1000),
limits = c(NA, 2000)
) +
scale_fill_manual(values = cols_class) + # For boxplot fill
scale_color_manual(values = cols_class) +
geom_point(
aes(colour = class),
size = 1,
alpha = .5,
position = position_jitter(
seed = 1, width = .2
))
plot_cell_area
# ------------------------------------------------------------------------------
# Clean the data for cell volume
dat_clean_cell_volume <- dat %>%
filter(!is.na(cell_volume))
# Calculate summary statistics of cell area for each class
df_summary <- dat_clean_cell_volume %>%
group_by(class) %>%
summarise(
n_species = n_distinct(species),
n_obs = n(),
max_y_cell_volume = max(cell_volume, na.rm = TRUE))
# Plot for Cell Volume
plot_cell_volume <- ggplot(dat_clean_cell_volume, aes(x = class, y = cell_volume, fill = class)) +
geom_boxplot(width = 0.6,
fill = "white",
outlier.shape = NA) +
geom_text(data = df_summary,
aes(y = max_y_cell_volume, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")),
vjust = -0.5, size = 2) +
theme_pubr() +
theme(
legend.position = "none",
axis.text.x = element_text(size = 6),
axis.text.y = element_text(size = 6),
axis.title.x = element_blank(),
axis.title.y = element_text(size = 10), # Reduce font size of the y-axis title
panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
) +
labs(y = expression("Cell volume (" * mu * m^3 * ")")) +
scale_y_log10(
breaks = c(10, 20, 50, 150, 350, 800, 2000, 6000, 20000),
limits = c(NA, 40000)
) +
scale_fill_manual(values = cols_class) + # For boxplot fill
scale_color_manual(values = cols_class) +
geom_point(
aes(colour = class),
size = 1,
alpha = .5,
position = position_jitter(
seed = 1, width = .2
))
plot_cell_volume
# ------------------------------------------------------------------------------
# Clean the data for nucleus area
dat_clean_nucleus_area <- dat %>%
filter(!is.na(nucleus_area))
# Calculate summary statistics of nucleus area for each class
df_summary <- dat_clean_nucleus_area %>%
group_by(class) %>%
summarise(
n_species = n_distinct(species),
n_obs = n(),
max_y_nucleus_area = max(nucleus_area, na.rm = TRUE))
# Plot for Nucleus Area
plot_nucleus_area <- ggplot(dat_clean_nucleus_area, aes(x = class, y = nucleus_area, fill = class)) +
geom_boxplot(width = 0.6,
fill = "white",
outlier.shape = NA) +
geom_text(data = df_summary,
aes(y = max_y_nucleus_area, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")),
vjust = -0.5, size = 2) +
theme_pubr() +
theme(
legend.position = "none",
axis.text.x = element_text(size = 6),
axis.text.y = element_text(size = 6),
axis.title.x = element_blank(),
axis.title.y = element_text(size = 10),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
) +
labs(y = expression("Nucleus area (" * mu * m^2 * ")")) +
scale_y_log10(
breaks = c(0, 5, 10, 20, 50, 100, 200),
limits = c(NA, 250)
) +
scale_fill_manual(values = cols_class) + # For boxplot fill
scale_color_manual(values = cols_class) +
geom_point(
aes(colour = class),
size = 1,
alpha = .5,
position = position_jitter(
seed = 1, width = .2
))
plot_nucleus_area
# ------------------------------------------------------------------------------
# Clean the data for nucleus volume
dat_clean_nucleus_volume <- dat %>%
filter(!is.na(nucleus_volume))
# Calculate summary statistics of nucleus volume for each class
df_summary <- dat_clean_nucleus_volume %>%
group_by(class) %>%
summarise(
n_species = n_distinct(species),
n_obs = n(),
max_y_nucleus_volume = max(nucleus_volume, na.rm = TRUE))
# Plot for Nucleus Volume
plot_nucleus_volume <- ggplot(dat_clean_nucleus_volume, aes(x = class, y = nucleus_volume, fill = class)) +
geom_boxplot(width = 0.6,
fill = "white",
outlier.shape = NA) +
geom_text(data = df_summary,
aes(y = max_y_nucleus_volume, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")),
vjust = -0.5, size = 2) +
theme_pubr() +
theme(
legend.position = "none",
axis.text.x = element_text(size = 6),
axis.text.y = element_text(size = 6),
axis.title.x = element_blank(),
axis.title.y = element_text(size = 10),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
) +
labs(y = expression("Nucleus volume (" * mu * m^3 * ")")) +
scale_y_log10(
breaks = c(10, 20, 50, 150, 350, 1000),
limits = c(NA, 2000)
) +
scale_fill_manual(values = cols_class) + # For boxplot fill
scale_color_manual(values = cols_class) + # For point colours
geom_point(
aes(colour = class),
size = 1,
alpha = .5,
position = position_jitter(
seed = 1, width = .2
))
plot_nucleus_volume
# ------------------------------------------------------------------------------
# Clean the data for mcv
dat_clean_mcv <- dat %>%
filter(!is.na(mcv))
# Calculate summary statistics of mcv for each class
df_summary <- dat_clean_mcv %>%
group_by(class) %>%
summarise(
n_species = n_distinct(species),
n_obs = n(),
max_y_mcv = max(mcv, na.rm = TRUE))
plot_mcv <- ggplot(dat_clean_mcv, aes(x = class, y = mcv, fill = class)) +
geom_boxplot(width = 0.6,
fill = "white",
outlier.shape = NA) +
geom_text(data = df_summary,
aes(y = max_y_mcv, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")),
vjust = -0.5, size = 2) +
theme_pubr() +
theme(
legend.position = "none",
axis.text.x = element_text(size = 6),
axis.text.y = element_text(size = 6),
axis.title.x = element_blank(),
axis.title.y = element_text(size = 10),
panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
) +
labs(y = expression("Mean corpuscular volume (" * mu * m^3 * ")")) +
scale_y_log10(
breaks = c(10, 20, 50, 150, 350, 800, 2000, 6000),
limits = c(NA, 15000)
) +
scale_fill_manual(values = cols_class) + # For boxplot fill
scale_color_manual(values = cols_class) + # For point colours
geom_point(
aes(colour = class),
size = 1,
alpha = .5,
position = position_jitter(
seed = 1, width = .2
))
plot_mcv
# ------------------------------------------------------------------------------
# Align the plots in a grid layout
Figure_8 <- plot_grid(
plot_cell_area + theme(plot.margin = unit(c(1, 0, 0, 0.2), "cm")),
plot_nucleus_area + theme(plot.margin = unit(c(1, 0.5, 0, 0), "cm")),
plot_cell_volume + theme(plot.margin = unit(c(1, 0, 0, 0.2), "cm")),
plot_nucleus_volume + theme(plot.margin = unit(c(1, 0.5, 0, 0), "cm")),
plot_mcv + theme(plot.margin = unit(c(1, 0, 0, 0.2), "cm")),
nrow = 3,
label_size = 14, # Slightly smaller labels for compact layout
label_fontface = "bold", # Make labels bold for clarity
label_x = 0.1, # Move labels closer to the plots horizontally
label_y = 0.96, # Adjust vertical placement of labels
align = "hv" # Ensure alignment
)
## Store Plots
ggsave('../manuscript/Figure_8.pdf', Figure_8, width = 7, height = 9)
ggsave('../manuscript/Figure_8.png', Figure_8, width = 7, height = 9, dpi = 1500)
df_years <- refs %>%
as.data.frame() %>%
select(year) %>%
mutate(year = as.numeric(as.character(year))) %>% # Convert the 'year' column to numeric
filter(!is.na(year)) %>% # Remove rows where 'year' is missing (NA)
group_by(year) %>%
summarise(num_studies = n()) %>%
arrange(year) %>%
mutate(cumulative_studies = cumsum(num_studies)) # Calculate cumulative count
df_years %>%
reframe(min_year = min(year),
max_year = max(year),
total_years = max_year - min_year)
## # A tibble: 1 × 3
## min_year max_year total_years
## <dbl> <dbl> <dbl>
## 1 1875 2024 149
journals <- refs %>%
as.data.frame() %>%
select(journal) %>%
filter(!is.na(journal)) %>%
group_by(journal) %>%
summarise(num_articles = n()) %>%
arrange(desc(num_articles))
journals %>%
slice_head(n = 15)
## # A tibble: 15 × 2
## journal num_articles
## <chr> <int>
## 1 Journal of Fish Biology 14
## 2 Fish Physiology and Biochemistry 9
## 3 Aquaculture 4
## 4 Aquaculture Research 4
## 5 Tissue and Cell 4
## 6 Aquaculture International 3
## 7 Ecotoxicology and Environmental Safety 3
## 8 Environmental Science and Pollution Research 3
## 9 Experimental medicine and surgery 3
## 10 Fish and Shellfish Immunology 3
## 11 Iranian Journal of Fisheries Sciences 3
## 12 Journal of Applied Ichthyology 3
## 13 Russian Journal of Marine Biology 3
## 14 Aquatic Toxicology 2
## 15 Brazilian Journal of Biology 2
dat %>%
filter(between(long_dec, -180, 180), between(lat_dec, -90, 90)) %>% # Include coordinates within expected ranges
mutate(hemisphere = ifelse(lat_dec > 0, "Northern","Southern")) %>%
group_by(hemisphere) %>%
reframe(
unique_studies = n_distinct(key),
unique_positions = n_distinct(unique(paste(lat_dec, long_dec, sep = "_"))), #count unique positions in the db, because some studies have more than one position
unique_species = n_distinct(species_reported),
records = n())
## # A tibble: 2 × 5
## hemisphere unique_studies unique_positions unique_species records
## <chr> <int> <int> <int> <int>
## 1 Northern 112 131 259 1119
## 2 Southern 21 23 34 120
n_spp <- dat %>%
distinct(species) %>%
nrow()
n_spp
## [1] 660
dat %>%
group_by(class) %>%
reframe(n_spp = length(unique(species)), total_study = length(unique(key)),
perc_species = (n_spp/660)* 100)
## # A tibble: 4 × 4
## class n_spp total_study perc_species
## <chr> <int> <int> <dbl>
## 1 Actinopterygii 595 180 90.2
## 2 Chondrichthyes 57 15 8.64
## 3 Cyclostomata 5 4 0.758
## 4 Dipnoi 3 3 0.455
dat %>%
group_by(species) %>%
summarise(num_studies = n_distinct(key)) %>%
arrange(desc(num_studies)) %>%
slice_head(n = 15) # Limit to top 15 species based on the number of studies
## # A tibble: 15 × 2
## species num_studies
## <chr> <int>
## 1 Oncorhynchus mykiss 17
## 2 Cyprinus carpio 12
## 3 Labeo rohita 10
## 4 Oreochromis niloticus 8
## 5 Ctenopharyngodon idella 7
## 6 Carassius auratus 6
## 7 Channa punctata 6
## 8 Clarias gariepinus 6
## 9 Salmo trutta 6
## 10 Clarias batrachus 5
## 11 Lophius piscatorius 5
## 12 Salmo salar 5
## 13 Tinca tinca 5
## 14 Dicentrarchus labrax 4
## 15 Echeneis naucrates 4
dat %>%
group_by(species) %>%
summarise(num_records = n()) %>%
arrange(desc(num_records)) %>%
slice_head(n = 15) # Limit to top 15 species based on the number of records
## # A tibble: 15 × 2
## species num_records
## <chr> <int>
## 1 Ctenopharyngodon idella 82
## 2 Oreochromis niloticus 71
## 3 Hypophthalmichthys molitrix 63
## 4 Labeo rohita 29
## 5 Oncorhynchus mykiss 24
## 6 Cyprinus carpio 18
## 7 Dicentrarchus labrax 17
## 8 Channa punctata 15
## 9 Salmo salar 13
## 10 Clarias batrachus 12
## 11 Abudefduf saxatilis 11
## 12 Haemulon aurolineatum 11
## 13 Haemulon flavolineatum 11
## 14 Holocentrus ascensionis 11
## 15 Lutjanus griseus 11
dat %>%
group_by(sex) %>%
reframe(n_spp = length(unique(species)),
total_studies = length(unique(key))) %>%
arrange(desc(total_studies))
## # A tibble: 4 × 3
## sex n_spp total_studies
## <fct> <int> <int>
## 1 <NA> 623 155
## 2 female 31 20
## 3 male 30 18
## 4 both 29 12
dat %>%
group_by(life_stage) %>%
reframe(n_spp = length(unique(species)),
total_studies = length(unique(key))) %>%
arrange(desc(total_studies))
## # A tibble: 4 × 3
## life_stage n_spp total_studies
## <fct> <int> <int>
## 1 <NA> 611 104
## 2 adult 61 49
## 3 juvenile 30 32
## 4 fingerlings 3 4
dat %>%
group_by(realm) %>%
reframe(n_spp = length(unique(species)),
total_studies = length(unique(key))) %>%
arrange(desc(total_studies))
## # A tibble: 5 × 3
## realm n_spp total_studies
## <chr> <int> <int>
## 1 freshwater-brackish 54 71
## 2 freshwater 93 62
## 3 freshwater-brackish-marine 73 60
## 4 marine 307 29
## 5 marine-brackish 133 24
The following function allows for the extraction of descriptive information about which species present the highest and lowest values for each trait.
get_summary <- function(var_name) {
dat %>%
group_by(species) %>%
summarise(
min_val = min(!!sym(var_name), na.rm = TRUE),
max_val = max(!!sym(var_name), na.rm = TRUE)
) %>%
summarise(
min_species = species[which.min(min_val)],
min_value = min(min_val, na.rm = TRUE),
max_species = species[which.max(max_val)],
max_value = max(max_val, na.rm = TRUE),
range = max_value - min_value,
magnitude_order = max_value / min_value
) %>%
mutate(variable = var_name) %>%
select(variable, min_species, min_value, max_species, max_value, range, magnitude_order)
}
variables <- c("cell_area", "nucleus_area", "cell_volume", "nucleus_volume", "mcv")
summary_table <- bind_rows(lapply(variables, get_summary))
summary_table
## # A tibble: 5 × 7
## variable min_species min_value max_species max_value range magnitude_order
## <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
## 1 cell_area Iranocichl… 16.2 Protopteru… 945. 928. 58.2
## 2 nucleus_ar… Iranocichl… 2.56 Proscymnod… 157. 155. 61.5
## 3 cell_volume Iranocichl… 41.1 Protopteru… 17024. 16983. 414.
## 4 nucleus_vo… Iranocichl… 3.42 Protopteru… 710. 706. 207.
## 5 mcv Solea sene… 14.4 Protopteru… 6940 6926. 482.
summary_data_act <- dat %>%
filter(class == "Actinopterygii") %>%
group_by(species_underscored) %>%
summarise(across(c(cell_area, cell_volume, nucleus_area, nucleus_volume, mcv),
~ mean(., na.rm = TRUE)),
.groups = "drop") %>%
rename(
"Cell area" = cell_area,
"Cell volume" = cell_volume,
"Nucleus area" = nucleus_area,
"Nucleus volume" = nucleus_volume,
"MCV" = mcv
)
summary_data_act <- summary_data_act %>%
mutate(across(c("Cell area", "Cell volume", "Nucleus area", "Nucleus volume", "MCV"),
~ (.-min(., na.rm = TRUE)) / (max(., na.rm = TRUE) - min(., na.rm = TRUE)))) # scale min-max by variable
summary_data_act
## # A tibble: 595 × 6
## species_underscored `Cell area` `Cell volume` `Nucleus area` `Nucleus volume`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Abalistes_stellatus 0.0722 NaN 0.116 NaN
## 2 Abramis_brama 0.112 0.104 NaN NaN
## 3 Abudefduf_saxatilis 0.100 0.0949 0.0373 NaN
## 4 Abudefduf_septemfa… NaN NaN NaN 0.0152
## 5 Abudefduf_sordidus 0.179 0.212 0.452 0.155
## 6 Abudefduf_taurus 0.111 0.111 NaN NaN
## 7 Abudefduf_vaigiens… 0.186 0.230 0.447 0.148
## 8 Acanthocybium_sola… 0.150 0.143 NaN NaN
## 9 Acanthogobius_hasta 0.159 0.171 0.413 0.211
## 10 Acanthopagrus_aust… NaN NaN NaN 0.0339
## # ℹ 585 more rows
## # ℹ 1 more variable: MCV <dbl>
unique(summary_data_act$species_underscored)
## [1] "Abalistes_stellatus" "Abramis_brama"
## [3] "Abudefduf_saxatilis" "Abudefduf_septemfasciatus"
## [5] "Abudefduf_sordidus" "Abudefduf_taurus"
## [7] "Abudefduf_vaigiensis" "Acanthocybium_solandri"
## [9] "Acanthogobius_hasta" "Acanthopagrus_australis"
## [11] "Acanthopagrus_butcheri" "Acanthostracion_polygonius"
## [13] "Acanthostracion_quadricornis" "Acanthurus_bahianus"
## [15] "Acanthurus_chirurgus" "Acanthurus_coeruleus"
## [17] "Acanthurus_gahhm" "Acanthurus_grammoptilus"
## [19] "Acipenser_brevirostrum" "Acipenser_oxyrinchus"
## [21] "Acipenser_sinensis" "Acipenser_sturio"
## [23] "Aeoliscus_strigatus" "Albula_vulpes"
## [25] "Aldrichetta_forsteri" "Alosa_fallax"
## [27] "Alphestes_afer" "Aluterus_schoepfii"
## [29] "Aluterus_scriptus" "Ameiurus_catus"
## [31] "Ammodytes_tobianus" "Amphiprion_akindynos"
## [33] "Amphiprion_clarkii" "Anabas_testudineus"
## [35] "Anguilla_bicolor" "Anguilla_japonica"
## [37] "Anguilla_marmorata" "Anguilla_rostrata"
## [39] "Aplodactylus_arctidens" "Apogon_maculatus"
## [41] "Aprion_virescens" "Archosargus_rhomboidalis"
## [43] "Arothron_manilensis" "Arripis_trutta"
## [45] "Astyanax_lineatus" "Astyanax_mexicanus"
## [47] "Atheresthes_evermanni" "Atractosteus_tristoechus"
## [49] "Aulacocephalus_temminckii" "Auxis_thazard"
## [51] "Bairdiella_ronchus" "Balistapus_undulatus"
## [53] "Balistes_capriscus" "Balistes_carolinensis"
## [55] "Balistes_vetula" "Barbatula_barbatula"
## [57] "Barbatula_toni" "Basilichthys_australis"
## [59] "Bathygobius_soporator" "Belone_belone"
## [61] "Bentartia_pusillum" "Betta_splendens"
## [63] "Boops_boops" "Boreogadus_saida"
## [65] "Bothus_lunatus" "Bovichtus_angustifrons"
## [67] "Brachygenys_chrysargyrea" "Brevoortia_tyrannus"
## [69] "Brycon_hilarii" "Bujurquina_vittata"
## [71] "Caelorinchus_innotabilis" "Caesio_cuning"
## [73] "Calamus_bajonado" "Calamus_calamus"
## [75] "Calamus_penna" "Calamus_pennatula"
## [77] "Callionymus_lyra" "Cantherhines_pullus"
## [79] "Canthigaster_bennetti" "Canthigaster_valentini"
## [81] "Carangoides_bartholomaei" "Carangoides_ruber"
## [83] "Caranx_carangus" "Caranx_hippos"
## [85] "Caranx_ignobilis" "Caranx_latus"
## [87] "Caranx_lugubris" "Caranx_sexfasciatus"
## [89] "Carassius_auratus" "Carassius_carassius"
## [91] "Carassius_gibelio" "Catostomus_catostomus"
## [93] "Catostomus_commersonii" "Centriscops_humerosus"
## [95] "Centropomus_undecimalis" "Centropyge_bicolor"
## [97] "Cephalopholis_cruentata" "Cephalopholis_fulva"
## [99] "Cephalopholis_miniata" "Chaetodipterus_faber"
## [101] "Chaetodon_capistratus" "Chaetodon_lunulatus"
## [103] "Chaetodon_ocellatus" "Chaetodon_rainfordi"
## [105] "Chaetodon_sedentarius" "Chaetodon_striatus"
## [107] "Channa_argus" "Channa_punctata"
## [109] "Channa_striata" "Cheilinus_trilobatus"
## [111] "Chelidonichthys_cuculus" "Chelidonichthys_lucerna"
## [113] "Chelon_ramada" "Chilomycterus_spinosus"
## [115] "Chirocentrus_dorab" "Chloroscombrus_chrysurus"
## [117] "Choerodon_albigena" "Choerodon_cephalotes"
## [119] "Choerodon_fasciatus" "Chromis_analis"
## [121] "Chromis_viridis" "Chrosomus_neogaeus"
## [123] "Chrysiptera_cyanea" "Cichlasoma_dimerus"
## [125] "Ciliata_mustela" "Cirrhinus_mrigala"
## [127] "Cirrhinus_reba" "Clarias_batrachus"
## [129] "Clarias_gariepinus" "Clupea_harengus"
## [131] "Clupeonella_cultriventris" "Cobitis_biwae"
## [133] "Cobitis_striata" "Cobitis_taenia"
## [135] "Cobitis_takatsuensis" "Coelorinchus_maurofasciatus"
## [137] "Colossoma_macropomum" "Conger_conger"
## [139] "Contusus_brevicaudus" "Coregonus_clupeaformis"
## [141] "Coregonus_maraena" "Coreius_guichenoti"
## [143] "Coris_batuensis" "Coryphaena_hippurus"
## [145] "Coryphaenoides_serrulatus" "Corythoichthys_intestinalis"
## [147] "Cottus_gobio" "Crossosalarias_macrospilus"
## [149] "Cryptacanthodes_maculatus" "Cryptocentrus_leptocephalus"
## [151] "Ctenopharyngodon_idella" "Cyclopteropsis_jordani"
## [153] "Cyclopterus_lumpus" "Cyprinus_carpio"
## [155] "Dactylopterus_volitans" "Danio_rerio"
## [157] "Dascyllus_aruanus" "Datnioides_polota"
## [159] "Delminichthys_ghetaldii" "Diagramma_labiosum"
## [161] "Diagramma_picta" "Diapterus_rhombeus"
## [163] "Diastobranchus_capensis" "Dicentrarchus_labrax"
## [165] "Diodon_holocanthus" "Diplodus_argenteus"
## [167] "Diplodus_vulgaris" "Diretmichthys_parini"
## [169] "Dischistodus_prosopotaenia" "Dissostichus_mawsoni"
## [171] "Dormitator_latifrons" "Drepane_punctata"
## [173] "Echeneis_naucrates" "Ecsenius_mandibularis"
## [175] "Ecsenius_yaeyamaensis" "Electrophorus_electricus"
## [177] "Ellochelon_vaigiensis" "Elopichthys_bambusa"
## [179] "Elops_saurus" "Engraulis_anchoita"
## [181] "Engraulis_encrasicolus" "Epalzeorhynchos_bicolor"
## [183] "Epalzeorhynchos_frenatum" "Epinephelus_adscensionis"
## [185] "Epinephelus_cyanopodus" "Epinephelus_fasciatus"
## [187] "Epinephelus_guttatus" "Epinephelus_merra"
## [189] "Epinephelus_ongus" "Epinephelus_quoyans"
## [191] "Epinephelus_spilotoceps" "Epinephelus_striatus"
## [193] "Equetus_pulcher" "Esox_lucius"
## [195] "Esox_niger" "Eucinostomus_argenteus"
## [197] "Eucinostomus_gula" "Eugerres_plumieri"
## [199] "Eumicrotremus_spinosus" "Eupomacentrus_fuscus"
## [201] "Eupomacentrus_leucostictus" "Eupomacentrus_variabilis"
## [203] "Euristhmus_lepturus" "Euthynnus_alletteratus"
## [205] "Eutrigla_gurnardus" "Farlowella_acus"
## [207] "Fistularia_petimba" "Gadus_morhua"
## [209] "Gaidropsarus_ensis" "Gaidropsarus_mediterraneus"
## [211] "Galaxias_maculatus" "Galaxias_olidus"
## [213] "Gambusia_holbrooki" "Gasterosteus_aculeatus"
## [215] "Gerres_cinereus" "Gerres_filamentosus"
## [217] "Gerres_subfasciatus" "Girella_elevata"
## [219] "Girella_zebra" "Glyptocephalus_cynoglossus"
## [221] "Glyptosternon_maculatum" "Gnathanodon_speciosus"
## [223] "Gobio_gobio" "Gobiocypris_rarus"
## [225] "Gobiodon_citrinus" "Gobius_cobitis"
## [227] "Gymnelus_viridis" "Gymnocephalus_cernua"
## [229] "Gymnocranius_audleyi" "Gymnocypris_eckloni"
## [231] "Gymnothorax_funebris" "Gymnothorax_pictus"
## [233] "Gymnothorax_vicinus" "Gymnotus_inaequilabiatus"
## [235] "Gyrinocheilus_aymonieri" "Haemulon_aurolineatum"
## [237] "Haemulon_flavolineatum" "Haemulon_plumierii"
## [239] "Haemulon_sciurus" "Halargyreus_johnsonii"
## [241] "Halichoeres_biocellatus" "Halichoeres_bivittatus"
## [243] "Halichoeres_garnoti" "Halichoeres_radiatus"
## [245] "Harengula_humeralis" "Helicolenus_barathri"
## [247] "Helicolenus_percoides" "Hemiglyphidodon_plagiometopon"
## [249] "Hemiramphus_brasiliensis" "Hemitripterus_americanus"
## [251] "Heteropneustes_fossilis" "Heterotis_niloticus"
## [253] "Hippocampus_abdominalis" "Hippoglossus_hippoglossus"
## [255] "Hirundichthys_affinis" "Holacanthus_bermudensis"
## [257] "Holacanthus_ciliaris" "Holacanthus_tricolor"
## [259] "Holocentrus_ascensionis" "Holocentrus_rufus"
## [261] "Hoplias_malabaricus" "Hoplisoma_metae"
## [263] "Hoplisoma_paleatus" "Hucho_hucho"
## [265] "Huso_huso" "Hypophthalmichthys_molitrix"
## [267] "Hypophthalmichthys_nobilis" "Hypoplectrus_unicolor"
## [269] "Hyporhamphus_melanochir" "Hypostomus_boulengeri"
## [271] "Hypostomus_plecostomus" "Icelus_spatula"
## [273] "Ictalurus_punctatus" "Idiacanthus_atlanticus"
## [275] "Iranocichla_hormuzensis" "Istigobius_rigilius"
## [277] "Jenynsia_lineata" "Kathetostoma_canaster"
## [279] "Katsuwonus_pelamis" "Konosirus_punctatus"
## [281] "Labeo_catla" "Labeo_chrysophekadion"
## [283] "Labeo_rohita" "Labrisomus_nuchipinnis"
## [285] "Lachnolaimus_maximus" "Lactophrys_trigonus"
## [287] "Lagocephalus_lunaris" "Lampris_regius"
## [289] "Lates_calcarifer" "Lefua_echigonia"
## [291] "Lefua_nikkonis" "Leiopotherapon_unicolor"
## [293] "Lepidopsetta_bilineata" "Lepomis_macrochirus"
## [295] "Lethrinus_atkinsoni" "Lethrinus_miniatus"
## [297] "Lethrinus_nebulosus" "Lethrinus_rubrioperculatus"
## [299] "Leucaspius_delineatus" "Leuciscus_idus"
## [301] "Limanda_aspera" "Limanda_limanda"
## [303] "Liparis_tunicatus" "Lipophrys_pholis"
## [305] "Lophius_americanus" "Lophius_piscatorius"
## [307] "Lutjanus_adetii" "Lutjanus_analis"
## [309] "Lutjanus_apodus" "Lutjanus_carponotatus"
## [311] "Lutjanus_cyanopterus" "Lutjanus_fulviflamma"
## [313] "Lutjanus_griseus" "Lutjanus_lutjanus"
## [315] "Lutjanus_russellii" "Lutjanus_sebae"
## [317] "Lutjanus_synagris" "Lutjanus_vitta"
## [319] "Lutjanus_vivanus" "Lycodichthys_dearborni"
## [321] "Macropodus_opercularis" "Macrourus_berglax"
## [323] "Makaira_nigricans" "Megaleporinus_macrocephalus"
## [325] "Megalobrama_amblycephala" "Megalops_cyprinoides"
## [327] "Melanogrammus_aeglefinus" "Merlangius_merlangus"
## [329] "Merluccius_bilinearis" "Merluccius_hubbsi"
## [331] "Merluccius_merluccius" "Mesogobius_batrachocephalus"
## [333] "Mesovagus_antipodum" "Metynnis_hypsauchen"
## [335] "Metynnis_maculatus" "Micropogonias_furnieri"
## [337] "Micropterus_coosae" "Micropterus_salmoides"
## [339] "Microspathodon_chrysurus" "Misgurnus_anguillicaudatus"
## [341] "Mola_mola" "Monopterus_albus"
## [343] "Morone_americana" "Morone_saxatilis"
## [345] "Mugil_cephalus" "Mugil_curema"
## [347] "Mugil_liza" "Mulloidichthys_martinicus"
## [349] "Mulloidichthys_vanicolensis" "Mullus_barbatus"
## [351] "Mullus_surmuletus" "Muraenesox_cinereus"
## [353] "Myoxocephalus_octodecemspinosus" "Myoxocephalus_quadricornis"
## [355] "Myoxocephalus_scorpius" "Myripristis_jacobus"
## [357] "Mystus_vittatus" "Myxocyprinus_asiaticus"
## [359] "Myxus_elongatus" "Myzopsetta_ferruginea"
## [361] "Nectamia_savayensis" "Nematalosa_come"
## [363] "Neocyttus_rhomboidalis" "Neogobius_melanostomus"
## [365] "Neoscopelus_macrolepidotus" "Niwaella_delicata"
## [367] "Notolabrus_tetricus" "Notopterus_notopterus"
## [369] "Novaculichthys_taeniourus" "Nuchequula_decora"
## [371] "Ocyurus_chrysurus" "Odontesthes_argentinensis"
## [373] "Ogcocephalus_vespertilio" "Oligoplites_saurus"
## [375] "Oncorhynchus_keta" "Oncorhynchus_kisutch"
## [377] "Oncorhynchus_mykiss" "Oncorhynchus_tshawytscha"
## [379] "Ophichthus_cephalozona" "Oplopomus_oplopomus"
## [381] "Opsanus_tau" "Oreochromis_mossambicus"
## [383] "Oreochromis_niloticus" "Osmerus_eperlanus"
## [385] "Osmerus_mordax" "Ostorhinchus_cookii"
## [387] "Ostorhinchus_endekataenia" "Ostorhinchus_guamensis"
## [389] "Pachypanchax_playfairii" "Pagellus_bogaraveo"
## [391] "Pagellus_erythrinus" "Pagrus_auratus"
## [393] "Pangasianodon_hypophthalmus" "Parachanna_obscura"
## [395] "Paragobiodon_xanthosoma" "Paralichthys_lethostigma"
## [397] "Paralichthys_olivaceus" "Paramugil_georgii"
## [399] "Parapercis_cylindrica" "Parapercis_hexophtalma"
## [401] "Parupeneus_forsskali" "Pelates_quadrilineatus"
## [403] "Pempheris_schomburgkii" "Perca_flavescens"
## [405] "Perca_fluviatilis" "Petroscirtes_fallax"
## [407] "Petroscirtes_lupus" "Petroscirtes_mitratus"
## [409] "Phoxinus_phoxinus" "Piaractus_brachypomus"
## [411] "Piaractus_mesopotamicus" "Pimelodella_gracilis"
## [413] "Plagioscion_squamosissimus" "Plagiotremus_rhinorhynchos"
## [415] "Planiliza_macrolepis" "Platichthys_flesus"
## [417] "Platybelone_argala" "Platycephalus_bassensis"
## [419] "Platycephalus_indicus" "Plectropomus_leopardus"
## [421] "Pleuronectes_platessa" "Podothecus_accipenserinus"
## [423] "Poecilia_mexicana" "Poecilia_reticulata"
## [425] "Pollachius_virens" "Polypterus_palmas"
## [427] "Polypterus_senegalus" "Pomacanthus_arcuatus"
## [429] "Pomacanthus_paru" "Pomacentrus_nagasakiensis"
## [431] "Pomadasys_kaakan" "Porichthys_porosissimus"
## [433] "Premnas_biaculeatus" "Priacanthus_tayenus"
## [435] "Prionotus_carolinus" "Prionotus_evolans"
## [437] "Pristiapogon_kallopterus" "Prochilodus_lineatus"
## [439] "Prognathodes_aculeatus" "Prosopium_cylindraceum"
## [441] "Psalidodon_anisitsi" "Psettodes_erumei"
## [443] "Pseudaphritis_urvillii" "Pseudocaranx_dentex"
## [445] "Pseudomonacanthus_peroni" "Pseudoplatystoma_corruscans"
## [447] "Pseudopleuronectes_americanus" "Pseudorhombus_jenynsii"
## [449] "Pseudupeneus_maculatus" "Pterois_volitans"
## [451] "Pterophyllum_scalare" "Pterygoplichthys_pardalis"
## [453] "Pungitius_pungitius" "Pygocentrus_nattereri"
## [455] "Rachycentron_canadum" "Rastrelliger_kanagurta"
## [457] "Repomucenus_limiceps" "Rhamdia_quelen"
## [459] "Rhamphichthys_rostratus" "Rhinesomus_triqueter"
## [461] "Rhynchocypris_lagowskii" "Rita_rita"
## [463] "Rutilus_kutum" "Rutilus_rutilus"
## [465] "Rypticus_saponaceus" "Salminus_affinis"
## [467] "Salmo_caspius" "Salmo_salar"
## [469] "Salmo_trutta" "Salvelinus_alpinus"
## [471] "Salvelinus_fontinalis" "Salvelinus_namaycush"
## [473] "Salvelinus_umbla" "Sander_vitreus"
## [475] "Sarda_australis" "Sardina_pilchardus"
## [477] "Sardinella_gibbosa" "Sarpa_salpa"
## [479] "Scardinius_erythrophthalmus" "Scarus_coeruleus"
## [481] "Scarus_croicensis" "Scarus_ghobban"
## [483] "Scarus_guacamaia" "Scarus_schlegeli"
## [485] "Scarus_taeniopterus" "Schizopyge_niger"
## [487] "Schizothorax_plagiostomus" "Schizothorax_prenanti"
## [489] "Scleropages_jardinii" "Scolopsis_monogramma"
## [491] "Scolopsis_vosmeri" "Scomber_scombrus"
## [493] "Scomberomorus_regalis" "Scophthalmus_maximus"
## [495] "Scophthalmus_rhombus" "Scorpaena_cardinalis"
## [497] "Scorpaena_plumieri" "Scorpaena_porcus"
## [499] "Scorpaenopsis_oxycephalus" "Scorpis_aequipinnis"
## [501] "Sebastes_alutus" "Sebastes_marinus"
## [503] "Sebastes_ocutalus" "Sebastes_polyspinis"
## [505] "Sebastes_schlegelii" "Selene_vomer"
## [507] "Selenotoca_multifasciata" "Semotilus_corporalis"
## [509] "Seriola_hippos" "Seriola_lalandi"
## [511] "Seriola_quinqueradiata" "Serrasalmus_eigenmanni"
## [513] "Siganus_doliatus" "Siganus_fuscescens"
## [515] "Siganus_lineatus" "Siganus_spinus"
## [517] "Siganus_sutor" "Signigobius_biocellatus"
## [519] "Sillaginodes_punctatus" "Sillago_analis"
## [521] "Silurus_asotus" "Siniperca_chuatsi"
## [523] "Solea_senegalensis" "Solea_solea"
## [525] "Soleichthys_heterorhinos" "Sorubim_cuspicaudus"
## [527] "Sorubim_lima" "Sparisoma_aurofrenatum"
## [529] "Sparisoma_chrysopterum" "Sparisoma_radians"
## [531] "Sparisoma_viride" "Sparus_aurata"
## [533] "Sphoeroides_greeleyi" "Sphoeroides_maculatus"
## [535] "Sphoeroides_spengleri" "Sphoeroides_testudineus"
## [537] "Sphyraena_barracuda" "Sphyraena_obtusata"
## [539] "Spicara_maena" "Sprattus_sprattus"
## [541] "Squalius_cephalus" "Stenotomus_chrysops"
## [543] "Stephanolepis_hispida" "Sufflamen_fraenatus"
## [545] "Symphodus_tinca" "Synbranchus_marmoratus"
## [547] "Syngnathus_fuscus" "Syngnathus_scovelli"
## [549] "Syngnathus_typhle" "Synodontis_notatus"
## [551] "Synodus_intermedius" "Synodus_sageneus"
## [553] "Tachysurus_fulvidraco" "Taurulus_bubalis"
## [555] "Tautoga_onitis" "Terapon_jarbua"
## [557] "Terapon_puta" "Tetrabrachium_ocellatum"
## [559] "Tetraodon_nigroviridis" "Thalassoma_bifasciatum"
## [561] "Thalassoma_klunzingeri" "Thalassoma_lucasanum"
## [563] "Thalassoma_lunare" "Thunnus_alalunga"
## [565] "Thunnus_albacares" "Thunnus_atlanticus"
## [567] "Thymallus_arcticus" "Thymallus_thymallus"
## [569] "Tinca_tinca" "Trachinocephalus"
## [571] "Trachinotus_botla" "Trachinotus_coppingeri"
## [573] "Trachinotus_falcatus" "Trachinus_draco"
## [575] "Trachurus_trachurus" "Trachystoma_petardi"
## [577] "Trematomus_bernacchii" "Tripodichthys_angustifrons"
## [579] "Trisopterus_luscus" "Turrum_fulvoguttatum"
## [581] "Turrum_gymnostethus" "Tylosurus_gavialoides"
## [583] "Ucla_xenogrammus" "Ulua_aurochs"
## [585] "Umbrina_coroides" "Upeneichthys_lineatus"
## [587] "Uranoscopus_scaber" "Urophycis_tenuis"
## [589] "Valenciennea_longipinnis" "Xiphias_gladius"
## [591] "Xiphophorus_hellerii" "Zebrasoma_scopas"
## [593] "Zenopsis_nebulosus" "Zeus_faber"
## [595] "Zoarces_americanus"
# Identify species to drop
species_to_drop <- setdiff(tree$tip.label, summary_data_act$species_underscored)
# we will drop 80 species (no bony fishes) from the tree
# Prune the tree
tree_act <- drop.tip(tree, species_to_drop)
# Align data and tree
datF <- summary_data_act %>%
column_to_rownames("species_underscored")
dat_tree <- datF %>%
filter(row.names(.) %in% tree_act$tip.label)
# Add missing species
missing_species <- setdiff(tree_act$tip.label, row.names(dat_tree))
dat_tree_NA <- data.frame(matrix(NA, nrow = length(missing_species), ncol = ncol(datF)))
row.names(dat_tree_NA) <- missing_species
colnames(dat_tree_NA) <- colnames(datF)
Data <- rbind(dat_tree, dat_tree_NA)
Data <- Data[match(tree_act$tip.label, row.names(Data)), ]
# Plot with species names
circ_names_act <- ggtree(tree_act, layout = "fan", open.angle = 15, branch.length="none") +
geom_tiplab(offset = 8, hjust = 0, size = 0.9)
circ_names_act <- rotate_tree(circ_names_act, 90)
circ_names_act
# Create a new plot with heatmap for each trait using a single scale
tree_data_act <- gheatmap(
circ_names_act,
Data,
width = 0.2,
offset = 0,
colnames_offset_x = 0,
colnames_offset_y = 0,
font.size = 4,
hjust = 0
)
tree_data_act
# Apply the same scale for all traits
tree_data_act <- tree_data_act +
scale_fill_viridis_c(option = "H", name = "Normalised Cell Traits", na.value = "grey90") +
theme(
legend.position = c(0.59, 0.55),
legend.title.position = "top",
legend.title.align = 0.5,
legend.direction = "horizontal",
legend.key = element_blank(),
legend.background=element_blank(),
legend.key.width = unit(0.9, "cm"),
legend.key.height = unit(0.7, "cm")
)
# ------------------------------------------------------------------------------
summary_data_no_act <- dat %>%
filter(class != "Actinopterygii") %>%
group_by(species_underscored) %>%
summarise(across(c(cell_area, cell_volume, nucleus_area, nucleus_volume, mcv),
~ mean(., na.rm = TRUE)),
.groups = "drop")%>%
rename(
"Cell area" = cell_area,
"Cell volume" = cell_volume,
"Nucleus area" = nucleus_area,
"Nucleus volume" = nucleus_volume,
"MCV" = mcv
)
summary_data_no_act <- summary_data_no_act %>%
mutate(across(c("Cell area", "Cell volume", "Nucleus area", "Nucleus volume", "MCV"),
~ (.-min(., na.rm = TRUE)) / (max(., na.rm = TRUE) - min(., na.rm = TRUE)))) # scale min-max by variable
summary_data_no_act
## # A tibble: 65 × 6
## species_underscored `Cell area` `Cell volume` `Nucleus area` `Nucleus volume`
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Aetobatus_narinari 0.133 0.0759 NaN NaN
## 2 Aptychotrema_rostr… 0.153 NaN 0.185 NaN
## 3 Bathyraja_parmifera 0.140 NaN 0.327 NaN
## 4 Bathytoshia_centro… 0.145 0.0845 0.217 0.249
## 5 Carcharhinus_brach… 0.0800 NaN 0.135 NaN
## 6 Carcharhinus_falci… 0.0762 0.0390 NaN NaN
## 7 Carcharhinus_leucas 0.0790 0.0404 NaN NaN
## 8 Carcharhinus_macul… 0.0178 0.00194 0.0498 0.0335
## 9 Carcharhinus_melan… 0.0424 NaN 0.103 NaN
## 10 Carcharhinus_milbe… 0.114 0.0794 NaN NaN
## # ℹ 55 more rows
## # ℹ 1 more variable: MCV <dbl>
unique(summary_data_no_act$species_underscored)
## [1] "Aetobatus_narinari" "Aptychotrema_rostrata"
## [3] "Bathyraja_parmifera" "Bathytoshia_centroura"
## [5] "Carcharhinus_brachyurus" "Carcharhinus_falciformis"
## [7] "Carcharhinus_leucas" "Carcharhinus_maculipinnis"
## [9] "Carcharhinus_melanopterus" "Carcharhinus_milberti"
## [11] "Carcharhinus_obscurus" "Carcharhinus_plumbeus"
## [13] "Centroscymnus_coelolepis" "Centroscymnus_crepidater"
## [15] "Centroscymnus_owstoni" "Chiloscyllium_punctatum"
## [17] "Dipturus_batis" "Dipturus_chilensis"
## [19] "Dipturus_laevis" "Etmopterus_brachyurus"
## [21] "Etmopterus_granulosus" "Fluvitrygon_signifer"
## [23] "Galeocerdo_cuvier" "Geotria_australis"
## [25] "Ginglymostoma_cirratum" "Hemiscyllium_ocellatum"
## [27] "Hemitrygon_bennettii" "Heterodontus_francisci"
## [29] "Hypanus_americanus" "Isurus_oxyrinchus"
## [31] "Lamna_nasus" "Lampetra_fluviatilis"
## [33] "Lampetra_planeri" "Leucoraja_erinaceus"
## [35] "Leucoraja_ocellata" "Mustelus_canis"
## [37] "Myxine_glutinosa" "Nebrius_ferrugineus"
## [39] "Negaprion_brevirostris" "Neoceratodus_forsteri"
## [41] "Neotrygon_kuhlii" "Orectolobus_ornatus"
## [43] "Oxynotus_bruniensis" "Oxynotus_centrina"
## [45] "Petromyzon_marinus" "Prionace_glauca"
## [47] "Proscymnodon_plunketi" "Protopterus_aethiopicus"
## [49] "Protopterus_annectens" "Raja_clavata"
## [51] "Raja_montagui" "Rhizoprionodon_terraenovae"
## [53] "Rostroraja_eglanteria" "Scyliorhinus_canicula"
## [55] "Scyliorhinus_stellaris" "Sphyrna_lewini"
## [57] "Sphyrna_mokarran" "Sphyrna_tiburo"
## [59] "Sphyrna_tudes" "Sphyrna_zygaena"
## [61] "Squalus_acanthias" "Squatina_australis"
## [63] "Squatina_squatina" "Tetronarce_nobiliana"
## [65] "Torpedo_torpedo"
# Identify species to drop
species_to_drop <- setdiff(tree$tip.label, summary_data_no_act$species_underscored)
# Prune the tree
tree_no_act <- drop.tip(tree, species_to_drop)
# Align data and tree
datF <- summary_data_no_act %>%
column_to_rownames("species_underscored")
dat_tree <- datF %>%
filter(row.names(.) %in% tree_no_act$tip.label)
# Add missing species
missing_species <- setdiff(tree_no_act$tip.label, row.names(dat_tree))
dat_tree_NA <- data.frame(matrix(NA, nrow = length(missing_species), ncol = ncol(datF)))
row.names(dat_tree_NA) <- missing_species
colnames(dat_tree_NA) <- colnames(datF)
Data <- rbind(dat_tree, dat_tree_NA)
Data <- Data[match(tree_no_act$tip.label, row.names(Data)), ]
# Plot with species names
circ_names_no_act <- ggtree(tree_no_act, layout = "fan", open.angle = 15, branch.length="none") +
geom_tiplab(offset = 4, hjust = 0, size = 2)
circ_names_no_act <- rotate_tree(circ_names_no_act, 90)
circ_names_no_act
# Create a new plot with heatmap for each trait using a single scale
tree_data_no_act <- gheatmap(
circ_names_no_act,
Data,
width = 0.2,
offset = 0,
colnames_offset_x = 0,
colnames_offset_y = 0,
font.size = 3,
hjust = 0
)
tree_data_no_act
# Apply the same scale for all traits
tree_data_no_act <- tree_data_no_act +
scale_fill_viridis_c(option = "H",
name = "Normalised Cell Traits",
na.value = "grey90") +
theme(
legend.position = c(0.6, 0.55),
legend.title.position = "top",
legend.title.align = 0.5,
legend.direction = "horizontal",
legend.key = element_blank(),
legend.background=element_blank(),
legend.key.width = unit(1, "cm"),
legend.key.height = unit(0.7, "cm")
)
tree_data_no_act
# Extract the list of species names reported in the original dataset
species_list <- dat$species_reported
# Validate the species names using FishBase to ensure accuracy
validated_species <- validate_names(species_list)
# Load the complete taxonomy backbone from FishBase
taxonomy_data <- load_taxa()
# Filter the taxonomy data to retain only the species that were validated
# Select key taxonomic ranks: Class, Order, Family, Genus, and Species
# Rename each selected column by appending 'fish_base' to indicate the source is FishBase
taxonomy_fb <- taxonomy_data %>%
filter(Species %in% validated_species) %>%
select(Class, Order, Family, Genus, Species) %>%
rename_with(~ tolower(.x)) %>%
rename_with(~ paste0(.x, "_fish_base"))
# Join the FishBase taxonomy information back to the original dataset
dat <- dat %>%
left_join(taxonomy_fb, by = c("species_reported" = "species_fish_base"))
# A quick inspection of the information extracted from FishBase reveals that many # species names are not contained in FishBase, resulting in NA values for the backbone taxonomy for several species.
# check names y sselec the most relevamt columns
names(dat)
## [1] "species_reported" "double_checked" "database"
## [4] "key" "body_mass_gram" "sex"
## [7] "life_stage" "lat_dec" "long_dec"
## [10] "location_description" "sample_size" "number_of_specimens"
## [13] "estimate_error_type" "cell_length" "cell_length_error"
## [16] "cell_width" "cell_width_error" "cell_area"
## [19] "cell_area_error" "cell_volume" "cell_volume_error"
## [22] "mcv" "mcv_error" "nucleus_length"
## [25] "nucleus_length_error" "nucleus_width" "nucleus_width_error"
## [28] "nucleus_area" "nucleus_area_error" "nucleus_volume"
## [31] "nucleus_volume_error" "notes" "phylum"
## [34] "class" "order" "family"
## [37] "genus" "species" "source"
## [40] "taxo_level" "isMarine" "isBrackish"
## [43] "isFresh" "realm" "species_underscored"
## [46] "cell_length_sd" "cell_width_sd" "cell_area_sd"
## [49] "cell_volume_sd" "mcv_sd" "nucleus_length_sd"
## [52] "nucleus_width_sd" "nucleus_area_sd" "nucleus_volume_sd"
## [55] "address" "country_collection" "subcontinent"
## [58] "class_fish_base" "order_fish_base" "family_fish_base"
## [61] "genus_fish_base"
# slect the most releventa columns and sort where is needed
ErythroCite_DB_v1.0.0 <- dat %>%
select(key, phylum, class, order, family, genus, species, species_reported, species_underscored,
class_fish_base, order_fish_base, family_fish_base, genus_fish_base,
database,
location_description, lat_dec, long_dec, country_collection, subcontinent, realm,
body_mass_gram, sex, life_stage, number_of_specimens,
cell_length, cell_width, cell_area, cell_volume, mcv,
nucleus_length, nucleus_width, nucleus_area, nucleus_volume,
cell_length_sd, cell_width_sd, cell_area_sd, cell_volume_sd, mcv_sd,
nucleus_length_sd, nucleus_width_sd, nucleus_area_sd, nucleus_volume_sd, notes)
names(ErythroCite_DB_v1.0.0)
## [1] "key" "phylum" "class"
## [4] "order" "family" "genus"
## [7] "species" "species_reported" "species_underscored"
## [10] "class_fish_base" "order_fish_base" "family_fish_base"
## [13] "genus_fish_base" "database" "location_description"
## [16] "lat_dec" "long_dec" "country_collection"
## [19] "subcontinent" "realm" "body_mass_gram"
## [22] "sex" "life_stage" "number_of_specimens"
## [25] "cell_length" "cell_width" "cell_area"
## [28] "cell_volume" "mcv" "nucleus_length"
## [31] "nucleus_width" "nucleus_area" "nucleus_volume"
## [34] "cell_length_sd" "cell_width_sd" "cell_area_sd"
## [37] "cell_volume_sd" "mcv_sd" "nucleus_length_sd"
## [40] "nucleus_width_sd" "nucleus_area_sd" "nucleus_volume_sd"
## [43] "notes"
# export file as csv and excel
write.csv(ErythroCite_DB_v1.0.0, "../manuscript/ErythroCite_DB_v1.0.0.csv", row.names = FALSE)
# and excel
writexl::write_xlsx(ErythroCite_DB_v1.0.0, "../manuscript/ErythroCite_DB_v1.0.0.xlsx")
Pottier, P., Burke, S., Drobniak, S. M., Lagisz, M. & Nakagawa, S. Sexual (in)equality? A meta-analysis of sex differences in thermal acclimation capacity across ectotherms. Functional Ecology 35, 2663–2678 (2021).
Benfey, T. J. & Sutterlin, A. M. The haematology of triploid landlocked Atlantic salmon, Salmo salar L. Journal of Fish Biology 24, 333–338 (1984).
Gregory, T. R. Animal genome size database. http://www.genomesize.com/. (2024).
session_info() %>%
details(summary = 'Current Session Information', open = TRUE)
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os macOS 26.0
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Amsterdam
date 2025-09-29
pandoc 3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
quarto 1.6.42 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-8 2024-09-12 [1] CRAN (R 4.3.3)
ape * 5.8-1 2024-12-16 [1] CRAN (R 4.3.3)
aplot 0.2.5 2025-02-27 [1] CRAN (R 4.3.3)
backports 1.5.0 2024-05-23 [1] CRAN (R 4.3.3)
bibtex 0.5.1 2023-01-26 [1] CRAN (R 4.3.3)
blob 1.2.4 2023-03-17 [1] CRAN (R 4.3.3)
bookdown 0.43 2025-04-15 [1] CRAN (R 4.3.3)
broom 1.0.8 2025-03-28 [1] CRAN (R 4.3.3)
bslib 0.9.0 2025-01-30 [1] CRAN (R 4.3.3)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.3.3)
car 3.1-3 2024-09-27 [1] CRAN (R 4.3.3)
carData 3.0-5 2022-01-06 [1] CRAN (R 4.3.3)
class 7.3-23 2025-01-01 [1] CRAN (R 4.3.3)
classInt 0.4-11 2025-01-08 [1] CRAN (R 4.3.3)
cli 3.6.5 2025-04-23 [1] CRAN (R 4.3.3)
clipr 0.8.0 2022-02-22 [1] CRAN (R 4.3.3)
codetools 0.2-20 2024-03-31 [1] CRAN (R 4.3.3)
cowplot * 1.1.3 2024-01-22 [1] CRAN (R 4.3.1)
curl 6.2.3 2025-05-24 [1] CRAN (R 4.3.3)
data.table 1.17.4 2025-05-26 [1] CRAN (R 4.3.3)
data.tree 1.1.0 2023-11-12 [1] CRAN (R 4.3.3)
DataExplorer * 0.8.3 2024-01-24 [1] CRAN (R 4.3.1)
DBI 1.2.3 2024-06-02 [1] CRAN (R 4.3.3)
dbplyr 2.5.0 2024-03-19 [1] CRAN (R 4.3.1)
desc 1.4.3 2023-12-10 [1] CRAN (R 4.3.3)
details * 0.4.0 2025-02-09 [1] CRAN (R 4.3.3)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.3)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
duckdb 1.3.2 2025-07-09 [1] CRAN (R 4.3.3)
duckdbfs 0.1.0 2025-04-04 [1] CRAN (R 4.3.3)
e1071 1.7-16 2024-09-16 [1] CRAN (R 4.3.3)
evaluate 1.0.4 2025-06-18 [1] CRAN (R 4.3.3)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.3.3)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.3.3)
fishualize * 0.2.3 2022-03-08 [1] CRAN (R 4.3.0)
Formula 1.2-5 2023-02-24 [1] CRAN (R 4.3.3)
fs 1.6.6 2025-04-12 [1] CRAN (R 4.3.3)
generics 0.1.4 2025-05-09 [1] CRAN (R 4.3.3)
ggfun 0.1.8 2024-12-03 [1] CRAN (R 4.3.3)
ggplot2 * 3.5.2 2025-04-09 [1] CRAN (R 4.3.3)
ggplotify 0.1.2 2023-08-09 [1] CRAN (R 4.3.0)
ggpubr * 0.6.0 2023-02-10 [1] CRAN (R 4.3.0)
ggsignif 0.6.4 2022-10-13 [1] CRAN (R 4.3.0)
ggthemes * 5.1.0 2024-02-10 [1] CRAN (R 4.3.1)
ggtree * 3.10.1 2024-02-27 [1] Bioconductor 3.18 (R 4.3.2)
glue 1.8.0 2024-09-30 [1] CRAN (R 4.3.3)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.3.3)
gridGraphics 0.5-1 2020-12-13 [1] CRAN (R 4.3.3)
gtable 0.3.6 2024-10-25 [1] CRAN (R 4.3.3)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
httr 1.4.7 2023-08-15 [1] CRAN (R 4.3.0)
igraph 2.1.4 2025-01-23 [1] CRAN (R 4.3.3)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.3.3)
jsonlite 2.0.0 2025-03-27 [1] CRAN (R 4.3.3)
kableExtra * 1.4.0 2024-01-24 [1] CRAN (R 4.3.1)
KernSmooth 2.23-26 2025-01-01 [1] CRAN (R 4.3.3)
knitr 1.50 2025-03-16 [1] CRAN (R 4.3.3)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.3)
lattice 0.22-7 2025-04-02 [1] CRAN (R 4.3.3)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.3.3)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.3)
lubridate 1.9.4 2024-12-08 [1] CRAN (R 4.3.3)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.3)
maps 3.4.3 2025-05-26 [1] CRAN (R 4.3.3)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.3)
networkD3 0.4.1 2025-04-14 [1] CRAN (R 4.3.3)
nlme 3.1-168 2025-03-31 [1] CRAN (R 4.3.3)
patchwork 1.3.0 2024-09-16 [1] CRAN (R 4.3.3)
pillar 1.10.2 2025-04-05 [1] CRAN (R 4.3.3)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.3)
plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.3)
png 0.1-8 2022-11-29 [1] CRAN (R 4.3.3)
proxy 0.4-27 2022-06-09 [1] CRAN (R 4.3.3)
purrr 1.0.4 2025-02-05 [1] CRAN (R 4.3.3)
R6 2.6.1 2025-02-15 [1] CRAN (R 4.3.3)
ragg 1.4.0 2025-04-10 [1] CRAN (R 4.3.3)
RColorBrewer 1.1-3 2022-04-03 [1] CRAN (R 4.3.3)
Rcpp 1.1.0 2025-07-02 [1] CRAN (R 4.3.3)
RefManageR * 1.4.0 2022-09-30 [1] CRAN (R 4.3.0)
rfishbase * 5.0.1 2025-01-12 [1] CRAN (R 4.3.3)
rlang 1.1.6 2025-04-11 [1] CRAN (R 4.3.3)
rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.3.3)
rnaturalearth * 1.0.1 2023-12-15 [1] CRAN (R 4.3.1)
rstatix 0.7.2 2023-02-01 [1] CRAN (R 4.3.0)
rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.3.3)
sass 0.4.10 2025-04-11 [1] CRAN (R 4.3.3)
scales 1.4.0 2025-04-24 [1] CRAN (R 4.3.3)
sessioninfo * 1.2.3 2025-02-05 [1] CRAN (R 4.3.3)
sf 1.0-21 2025-05-15 [1] CRAN (R 4.3.3)
stringi 1.8.7 2025-03-27 [1] CRAN (R 4.3.3)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
svglite 2.2.1 2025-05-12 [1] CRAN (R 4.3.3)
systemfonts 1.2.3 2025-04-30 [1] CRAN (R 4.3.3)
terra 1.8-50 2025-05-09 [1] CRAN (R 4.3.3)
textshaping 1.0.1 2025-05-01 [1] CRAN (R 4.3.3)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidygeocoder * 1.0.6 2025-03-31 [1] CRAN (R 4.3.3)
tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.3.1)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.1)
tidytree 0.4.6 2023-12-12 [1] CRAN (R 4.3.1)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.3)
treeio 1.26.0 2023-11-06 [1] Bioconductor
units 0.8-7 2025-03-11 [1] CRAN (R 4.3.3)
utf8 1.2.5 2025-05-01 [1] CRAN (R 4.3.3)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.3)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.3)
withr 3.0.2 2024-10-28 [1] CRAN (R 4.3.3)
writexl 1.5.4 2025-04-15 [1] CRAN (R 4.3.3)
xfun 0.52 2025-04-02 [1] CRAN (R 4.3.3)
xml2 1.3.8 2025-03-14 [1] CRAN (R 4.3.3)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.3.3)
yulab.utils 0.2.0 2025-01-29 [1] CRAN (R 4.3.3)
[1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
* ── Packages attached to the search path.
──────────────────────────────────────────────────────────────────────────────